Unit 11 


Testing new drugs 


Introduction 


Curiosity is one of the most striking features of human behaviour. People 
are constantly meddling with things, taking them to pieces, trying to find 
out how they work, and seeing what happens when they are altered. This 
curiosity has proved to be immensely fruitful, for on it depends much of 
present-day scientific and technological knowledge. Human beings 
experiment: they actively interfere with their surrounding world, and as a 
result they learn about its properties and learn to manipulate it to their 
own ends. It is the theme of experimentation that forms the subject 
matter of this unit. 


The focus in this unit is on one particular aspect of medicine: drug-testing. 
This is one of the most important areas in which scientific experimentation 


has been applied. Experiments were discussed quite generally in Unit 10 
and, as in that unit, in parts we will concentrate on collecting data 
(stage 2 of the statistical modelling diagram). However, we shall also use 


techniques and concepts from earlier units to analyse the data, and we will 


see that the method of analysis to be used must be taken into account 


when an experiment is being planned. In drug-testing, and in the whole of 


scientific experimentation, statistics play a very important role. It is this 
role that is described in this unit. Also, since an experiment is performed 
to answer specific questions about the real world, its results must be 
interpreted in terms of what they say about that world. So this unit also 
provides further examples of how the statistical ideas you have learnt in 
M140 fit into the process of making decisions in the real world. 


Hundreds of new drugs are put on the market every year. Their 
manufacturers usually claim that the new product is better than existing 


ones, either because it is more effective or because it has fewer side effects. 


But how does a manufacturer discover what the effects of a new drug 


really are? With the health of thousands or millions of potential consumers 


at stake, there is certainly no room for serious mistakes. This means that 
each new drug has to be thoroughly tested, both to ensure that it does 
have the desired effect, and also to ensure that it does not have undesired 
side effects. Drug manufacturers therefore have to carry out numerous 
experiments in which they test the effects of new drugs, usually first on 
animals, or on tissues taken from animals, and then on human beings. 
After each stage in the drug-testing procedure, they have to decide 
whether to continue with the next stage, or whether to reject the new 
drug. The nature of their decision will be determined in part by the 
statistical information that they receive about the effects of the drug, as 
revealed by the tests they have conducted. Examples of questions that a 
drug manufacturer may ask are as follows. 


What is the average effect of the new drug on, say, body weight? 
How variable is this effect? 


How does the effect compare with that of other well-known and 
well-tested drugs? 
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Of course, many other factors will also be taken into account in deciding 
whether to continue with further stages of testing: for example, the cost 
involved, the seriousness and prevalence of the medical conditions for 
which the drug might be used, and the availability of existing alternatives. 


Statistical information has a role in making practical decisions in various 
aspects of everyday life. Almost always, as in drug-testing, statistical 
information is just one among many factors that have to be taken into 
account when coming to a decision. You will see how the relative 
importance of statistical and other sources of information should be 
assessed in decision-making. This unit is concerned with the practical side 
of statistics and decision-making, but in a rather narrow fashion. If a drug 
company (and the public) is to have any confidence in the data that 
emerges from the tests it has carried out on a new drug, then those tests 
must be carried out according to certain rules. If these rules are not 
observed, then the results of the tests could easily be worthless. We shall 
describe these rules in some detail. 


Much of this unit is devoted to a detailed discussion of the principles of 
drug-testing but, before this, in Section 1 we outline the various stages of 
testing a new drug and getting it licensed for use. This first section 
provides only background information, so you do not have to remember its 
details, nor will it be assessed, but you may wish to refer back to it when 
studying the rest of the unit. Section 2 describes the principles and 
methods used in clinical trials, which are one of the important methods of 
testing drugs on human patients. Then Section 3 looks in detail at the 
design of some clinical trials. This leads naturally to the analysis of the 
data obtained from such trials, and Section 4 shows the close relationship 
between this analysis and the design of the trial. We then consider, in 
Section 5, some of the limitations of clinical trials and the further work 
that needs to be undertaken to ensure that the drugs which are marketed 
are as beneficial as possible to society. In Section 6, the whole process of 
testing and launching a new drug is illustrated by a case study. Finally, 
Section 7 directs you to the Computer Book. 


This unit, unlike the others in M140, contains many references to its 
sources of information in ‘Harvard style’. This means that the full 
reference is in a separate References section towards the end of the unit, 
with only a brief mention such as ‘(Christiansen, 2006)’ or ‘SMC (2008)’ in 
the main unit text. You are not expected to follow up the detailed 
reference, but the information is there if you wish to. 


1 Drug research and testing 


Section 1 is for background information only and will not be 
assessed. 


A new drug goes through a long series of tests on animals and human 
beings before it can be put on the market. The first stage is screening. 
Large numbers of chemical compounds are synthesised in the laboratories 
of drug companies, and it is necessary to identify those which have 
properties that are likely to be of interest. 


Tests are devised which, it is hoped, will predict whether each compound 
has the desired features. The tests may use bacteria grown in cultures in 
the laboratory, cells or other material taken from animals or humans, or 
they may be tests on animals. In view of the large numbers of compounds 
involved, these tests must be relatively quick and easy to perform. Many 
compounds are rejected at this stage, but the performance of some 
indicates that they merit further investigation. 


Screening is followed by a much fuller series of tests on those compounds 
that seem to be of interest. Many aspects of the action of the drug will be 
looked at. Most drugs have both desirable and undesirable (or toxic) 
effects. Tests at this stage will indicate both the size of dose of the drug 
needed to produce the desirable effects and the size of dose that produces 
the toxic effects. There will also be tests of the possibility that the 
compound may cause cancer or birth defects. 


1.1 The phases of drug-testing 


If the drug seems to have desirable effects, if toxic actions are only found 
at doses much higher than those required to produce these desirable 
effects, and if there is no tendency to cause cancer or defective offspring, 
then tests in human beings may be started. Conventionally, the tests on 
human beings are divided into four phases. 


Phase 1: Early clinical pharmacology 


The first people to take the drug will normally be healthy volunteers 
rather than patients, and be closely monitored 24 hours a day within a 
clinic. The drug will initially be given in single doses that are smaller than 
those expected to be effective. The biological action and safety of the drug 
will be evaluated, before the dose is gradually increased. In some of these 
studies, the amount of the drug in the bloodstream or urine will be 
measured, so that the rate of absorption of the drug, and its rate of 
elimination from the body, can be assessed. Later on, studies will be 
carried out in which the volunteers take the drug repeatedly over a period 
of time. 


Phase 2: Early clinical investigations 


These are usually the first studies involving patients with the condition the 
drug is intended to treat. The aims here are as follows: to get an idea of 
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the best dose to use; and to study the efficacy of the drug — that is, its 
ability to have the effects it is designed to produce. It may be that 
particular symptoms of the disease respond well to the drug, or that a 
certain type of patient responds better than others. Thus this phase is 
largely concerned with forming hypotheses about the action of the drug. 


Phase 3: Comparative studies 


In this phase the hypotheses formed in phase 2 are tested. To do this, 
comparative studies, called clinical trials, are needed. Treatment with 
the new drug is compared with existing therapy or, sometimes, with no 
treatment. A series of different dosages and dosage schedules (i.e. how 
often the drug is taken) are compared. A wider variety of patients will also 
be treated, including, for example, elderly patients and those suffering 
from more than one disease. This phase completes the work necessary to 
register the drug (i.e. to obtain permission to market it). 


Phase 4: Post-marketing studies 


This phase includes studies to provide further evidence about the safety of 
the drug. These use a larger sample of patients than can be obtained 
before marketing. It also includes various market research studies. 


Although both scientific experiments and statistical analysis are important 
during each of these four phases, they are particularly well illustrated in 
clinical trials carried out during phase 3 (and sometimes phase 2). This 
unit therefore concentrates on these clinical trials. 


1.2 The licensing of a new drug 


All medicines, whether available only on prescription or freely 
over-the-counter, must be licensed. A Europe-wide licence may be granted 
by the European Commission after evaluation, by the Committee for 
Medicinal Products for Human Use (CHMP) or another specialist 
committee within the European Medicines Agency (EMA). The CHMP is 
composed of one member from each EU country, all of whom are 
specialists in the field (including physicians, pharmacists, pharmacologists 
and toxicologists). The CHMP advises on whether a new drug should be 
licensed, basing its decisions on scientific criteria that determine whether 
new medicines meet safety, efficacy and quality requirements evidenced by 
clinical trial results. For each drug that is granted authorisation for a 
licence, the CHMP publishes a European public assessment report. This 
provides details of the assessment process and states the grounds on which 
the committee recommended authorisation. It also gives a summary of the 
product characteristics, the product’s labelling, and the patient 
information leaflet. 


The EMA is concerned not only with applications to market new drugs 
and to use old drugs for new purposes. It also has an important role in the 
monitoring of drugs that are already on the market. From time to time a 
doctor will come across a complaint, an illness or perhaps a death which he 
suspects may be caused by a drug. There is a Europe-wide database, 
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EudraVigilance, where reports of adverse drug reactions are held. These 
reports are carefully monitored, and where necessary, the EMA advises the 
European Commission to change a medicine’s licence. These methods of 
gathering information about the unwanted effects of drugs are discussed in 
more detail in Section 5. 


Once a drug has been launched, doctors are allowed to prescribe it to 
patients who meet the criteria of the licence. However, for prescribing in 
Britain under the National Health Service (NHS), a treatment for a specific 
condition must be approved by the National Institute for Health and Care 
Excellence (NICE) in England, Wales and Northern Ireland or the Scottish 
Medicines Consortium (SMC) in Scotland before doctors are freely allowed 
to prescribe it. These bodies evaluate the cost-effectiveness and efficacy of 
treatments to provide approval and guidance of their use under the NHS. 


Example 1 NICE and sorafenib 


The drug sorafenib (marketed as Nexavar), used in the treatment of liver 
and kidney cancers, was licensed by the European Commission in 2006. 
The licence was approved on evidence from two phase 3 clinical trials 
(EMA, 2007a and 2007b) comparing sorafenib with treatment containing 
no medication. The first trial involved 602 liver cancer patients. The 
second trial involved 903 kidney cancer patients in whom previous cancer 
treatment had stopped working. Sorafenib was shown to increase the 
length of time patients survived in both trials by an average of 2.8 and 3.4 | 
months respectively. The drug was found to have several unwanted side Shee 
effects, but the benefit to patient survival was considered to outweigh the PERNES 


ee 


Sorafenib is a very expensive treatment. The cost of treatment is not a 
consideration in the drug licensing approval process. However, when the 
drug was evaluated by NICE (2009, 2010) and SMC (2008), it was not 
found to be cost-effective, so sorafenib was not approved for prescribing 
under the NHS. 


@ | Nexavar 200mg 
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In this section, we shall discuss in some detail the experiments that need 
to be conducted to test drugs. As already mentioned, such experiments are 
called clinical trials. First we shall describe one of the first-ever clinical 
trials, carried out in 1747 by a physician serving in the Royal Navy, 

James Lind (1716-1794). 
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Example 2 James Lind’s scurvy experiment 


James Lind (see Lind, 1753) carried out an experiment to test various 
treatments for scurvy, a disease now known to result from vitamin C 
deficiency. Scurvy patients develop spots, spongy gums, weak knees and 
generally feel unwell. As scurvy advances, patients may develop open 
wounds with pus, jaundice, fever and loss of teeth, and eventually die. 
Scurvy was once common among sailors on board ships away at sea for 
periods longer than it was possible to store most fresh fruit and vegetables, 
and it resulted in many deaths. 


On 20 May 1747, James Lind was the surgeon on board the HMS Salisbury 
patrolling the Bay of Biscay. At the time, the cause of scurvy was 
unknown, but Lind had reviewed the available literature on scurvy to learn 
as much as he could. Lind took 12 patients with scurvy, as similar as 
possible, all with ‘putrid gums, the spots and lassitude and weakness of 
their knees’. They were fed the same general diet, but this was 
supplemented with an additional treatment: 


two were given a quart of cider daily 


e two were given 25 drops of elixir of vitriol (sulfuric acid) three times 
per day 


e two were given two spoonfuls of vinegar three times per day 

e two were given half a pint of sea-water daily 

e two were given two oranges and one lemon 

e two were given bigness of a nutmeg (a spicy paste) three times per day. 


Lind observed ‘most sudden and visible good effects’ on the two patients 
given oranges and lemons; both were fit for duty at the end of six days. 


2.1 Drug-testing by experiment 


At first sight, nothing would seem simpler than testing the effects of a 
drug. Simply give a person suffering from the relevant disease a dose of the 
drug, and see what happens. There are many reasons — some obvious, 
others quite subtle — why such a simple procedure might not work. To 
discover some of these reasons, the following activities consider the use of 
aspirin to treat headaches. 


Activity 1 Was it the aspirins? 


Suppose that you have a headache, and take two aspirins. An hour later 
the headache has gone. What do you conclude about the effectiveness of 
aspirin as a pain-killer? 
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Activity 2 Any better? 


Suppose that you took a group of 20 people, all with headaches, and gave 
them each two aspirin. An hour later, 16 of them said they had no 
headache. What do you now say about the effectiveness of aspirin as a 
pain-killer? 


Even without treatment, headaches will often go away after a while. To 
gain a better idea of whether aspirin helps headaches go away in the above 
type of test, you need another group of people with headaches who do not 
take aspirin. You can then compare the recovery rate of those who take 
aspirin with the recovery rate of those who do not. 


Such a comparison, between a group that does undergo some form of 
treatment and a group that does not, is fundamental to drug-testing. It is 
also fundamental to many other kinds of scientific investigation, as you 
have already seen in Unit 10. Such a comparison is an experiment. In such 
an experiment, the group that receives the experimental treatment is 
referred to as the experimental group, and the group that does not 
receive the experimental treatment is referred to as the control group. 


In an ideal experiment, the experimental group and the control group 
resemble each other in every respect except for the one being tested. If this 
ideal were achieved for the aspirin experiment, and you found that all of 
the experimental group (the aspirin-tested people) recovered from their 
headaches within an hour, whereas none of the control group did, then you 
could be virtually certain (barring an extraordinary fluke) that the aspirin 
had really cured the headaches. 


However, it is by no means an easy task to make sure that the 
experimental and control groups resemble each other in every respect apart 
from the one being tested. 


Activity 3 Using a control group 


Consider an experimental group of headache sufferers who have been given 
aspirin, and a control group who have not. How do the two groups differ 
other than in the presence or absence of the drug in the body? 


Treating a patient can quite commonly appear to have a therapeutic 
effect (or can actually have a therapeutic effect), even when the 

treatment contains no medication and should be ineffectual. This is ‘t couldw't afford a control 
called the placebo effect. 





group so | decided to go with 
E an out-of-control group.’ 
People have all sorts of expectations about medicines that they receive 

from the doctor, as well as about other types of treatment. These are 

derived from their own past experiences, from the attitudes of their friends 

and family, and also from the attitude and expectations of the doctor. 
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These attitudes can have an important influence on the outcome of 
treatment. This is by no means confined to neurotic or suggestible people, 
or to hypochondriacs. Placebo effects, as you might expect, tend to be 
most noticeable in mild conditions, but they cannot be neglected, even in 
quite severe conditions. Diabetes and asthma, for example, have both been 
shown to be sensitive to such effects. 


Activity 4 Controlling for the placebo effect 


How can the placebo effect be overcome in the experiment on aspirin in 
Activity 3? 


A dummy treatment that superficially resembles the treatment being 
tested but contains no active ingredient is called a placebo. 


A clinical trial (i.e. experiment) in which the control group takes a placebo 
is called a placebo-controlled trial. There is one ethical problem with 
placebo-controlled trials: patients go to their doctors to get treatment, not 
dummies. If you assume that the patients agree to take part in a 
drug-testing experiment, you may be able to think of circumstances where 
there would be no ethical problem in using a placebo as a control. Three 
such circumstances spring to mind: 


e Where no effective treatment exists for the disease in question. 
e Where the condition is mild. 


e Where the new treatment is an addition to an existing treatment so 
that the comparison is between old treatment plus placebo and old 
treatment plus new treatment. 


Placebo-controlled trial 


In a placebo-controlled trial, there is a treatment group and a control 
group. People in the treatment group receive the treatment being 
tested, while those in the control group are given a placebo. 


In the case of a serious, or fairly serious illness, where an effective 
treatment already exists, there would obviously be no justification for 
leaving a person untreated. What should be done in such a case? The 
answer is that the existing treatment rather than a placebo should be 
given to the control group. 


Such questions are, of course, ethical and they can be answered only by 
taking up a particular ethical position. The position implied in the 
answers given above is that it is necessary to do the following: 


e Minimise the possibilities of causing harm and lack of benefit to a 
person from a clinical trial. 


e Maximise the knowledge gained about the effectiveness and risks of 
the drugs. 


e Make sure that the person being treated knows what is happening and 
agrees to take part in the trial (this is called informed consent). 


Some examples will illustrate these issues more clearly. 





Example 3 Smoking cessation therapies 


There have been many trials of smoking cessation therapies (treatments to 
help people stop smoking) that are placebo-controlled. Many smokers quit 
without treatment, and some study participants may be more likely to quit 
because they expect the new therapy to help. It has been argued that 
placebo-controlled trials are unethical because quitting smoking is an 
important change that improves health, and several well-accepted 
therapies have been shown to increase cessation rates. Others argue that 
most people in the control group would not take any therapy to help them 
stop smoking if they were not in the study, and a placebo effect could help 
them quit smoking. 


Example 4 Common cold 


The common cold is incurable in the sense that no known drug can kill the 
viruses that cause it. On the other hand, drugs are available that can 
alleviate the symptoms (such as sneezing, headache, watering eyes, blocked 
nose). If a drug company develops a new drug that they think will kill the 
virus, then they would want to test it on an experimental group of people 
with colds, and to use as a control group people who had colds but were 
not taking any existing drugs that relieve the symptoms. In this way they 
could clearly distinguish the effects of the new drug and would not feel too 
unhappy about asking the control group to suffer their colds without 
treatment. 


Activity 5 Placebo for scurvy 


Would it have been ethical for James Lind to include a placebo group in 
his experiment on scurvy patients? 


2.2 Measurement 


You may remember that in Subsection 2.1, to assess the effectiveness of 
aspirin, we suggested that people should be asked whether or not they still 
had a headache one hour after taking two aspirin or two placebos. 
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Activity © A good question? 


Suppose you are designing a placebo-controlled trial to test aspirin’s 
effectiveness at reducing headaches. For comparing the treatments, what 
problems might there be in asking each subject if the headache is still 
present one hour after taking the tablets? 


E 
5 
i 
i 
i 





‘This doesn't look good. I'm afraid you've 
developed an immunity to placebos.’ 


As noted in the solution to Activity 6, there are a number of ways that 
aspirin and a placebo might differ, both with regard to changes in the 
severity of the headache and the time the headache lasts. Some of these 
differences would be missed by asking a single yes/no question of the form 
‘Did you still have a headache an hour after taking the tablets?’. 


To get round this problem, you might ask the people to rate their 
headache on a scale such as this: 

e O0- None. No headache. 

e 1 — Mild. The headache is there but it does not bother me much. 

e 2~- Moderate. The headache is definitely a nuisance. 

e 3- Severe. The headache is so bad that I can hardly think of anything 


else. 


You could ask them to rate their headaches on this scale before taking 
aspirin, and then every 15 minutes over a period of, say, 4 hours. 


What is the purpose of taking a rating before treatment? This is done 
because the aspirin might improve a headache without completely curing 
it, and this would still be better than nothing. It is possible to judge 
improvement only if you know how bad the headache is to start with. 


Activity 7 lIs a subjective scale OK? 


Judging the severity of a headache is not easy. The measure is, by its 
nature, subjective. People do not find it easy to assess their own 
discomfort very accurately, or reliably. Does this matter? 


Although many symptoms, like pain, must be assessed subjectively, others 
are amenable to quantitative and, relatively speaking, more objective 
measurement. Your doctor can, for example, measure your blood pressure 
or your pulse-rate and express these measurements in widely understood 
quantitative terms (units of pressure or beats per minute). If you receive 
some form of treatment that alters your blood pressure or pulse-rate, then 
these changes can also be expressed in quantitative terms. Measurements 
such as these are objective measurements, in contrast to the 
subjective measurements mentioned earlier. 


‘Objective’ is, of course, a relative term. Different doctors using exactly the 
same equipment on the same patient might well obtain slightly different 
values of the quantity they are measuring, simply because they use the 
equipment in slightly different ways. Different doctors using different 
pieces of equipment would be still more variable in the measurements they 
produced. Although objective measurements are more cut-and-dried than 
subjective measurements, this does not mean that they possess absolute 
accuracy, nor that they are necessarily, under all circumstances, of greater 
value to the doctor than subjective measurements. Nor does it mean that 
objective measurements are free from placebo effects. Objective 
measurements can be just as susceptible to placebo effects as subjective 
ones. You are probably aware that if you are anxious your heart rate (the 
number of beats per minute of the heart) is likely to go up, as is your 
blood pressure. The outcome of objective measurements can be influenced 
by the expectations and attitudes of the patient to treatment almost as 
much as can subjective measurements. Therefore, whether the effectiveness 
of the treatment is measured objectively or subjectively, it is important 
that the clinical trial be properly designed using a control group. 


Activity 8 Beta-blocker and placebo 


Suppose that a research laboratory is setting up a clinical trial of the 
effect of a beta-blocker on blood pressure. One group of people takes the 
beta-blocker tablet, whilst the other group takes a dummy tablet (a 
placebo). Does it matter if the people know which kind of tablet they are 
taking? 


A clinical trial in which individuals do not know whether they are taking 
the drug or the placebo, is called a blind trial. 
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Activity 9 Should the doctor know who gets the placebo? 


For the clinical trial in Activity 8, what about the doctors who are giving 
the treatment to the patient: does it matter if they know which people 
receive which treatment? 


If the doctors in a research laboratory are comparing two treatments, then 
they may also have expectations. They may be actively involved in the 
research project and be enthusiastic about one of the treatments. Such 
enthusiasm can be conveyed to the patients and change their expectations. 
Or a doctor might be sceptical about one of the treatments. This too can 
be conveyed to the patients and affect the outcome of the trial. (We shall 
use trial as short for clinical trial throughout this unit now.) 


As well as such direct effects on the patients, there are effects caused by 
the doctors’ involvement in assessing the outcome, i.e. their attitudes may 
affect their assessment of the outcome. They may, for example, tend to 
assess the patients on a new drug as doing better than those on a placebo; 
or, because they are aware of this possibility, they may bend over 
backwards to be fair, over-compensate, and so assess the patients on a 
placebo as better. Each of these effects will tend to shift the results in a 
particular direction (but they may partially cancel each other out). Such 
effects are examples of bias. 


Activity 10 Bias in James Lind's scurvy experiment 


When James Lind conducted his experiment on scurvy patients, his two 
worst patients ‘with tendons in the ham rigid (a symptom none of the rest 
had)’ were given a half pint of sea-water daily as their treatment. There 
were only two patients assigned to each of six treatments. How might this 
have biased his results on the sea-water treatment in a particular direction? 


The doctors who give the treatment should therefore also be ignorant of 
the nature of the treatment being given. (This may be difficult to achieve 
in practice, especially if the patient’s reactions to the drug are obvious to 
the doctor.) 


A trial where neither patients nor doctors know which treatment is 
administered, is called a double-blind trial. A study in which the 
patient is blind but the doctor is not, or vice versa, is sometimes 
called a single-blind trial. 


As far as is possible, all controlled clinical trials are double-blind trials. 
You may wonder how anybody knows which kind of tablet a patient is 
receiving in a double-blind trial. Usually a third party, such as a 
pharmacist who is independent of the doctors and patients, keeps a record 


of which tablets have been administered by which doctors to which 
patients. 





Her attempt to stay blinded 
was ruined by the sign-building 
side-effect of the treatment. 


As we have already said, James Lind’s 1747 experiment is credited as 
being the first-ever clinical trial. According to Bhatt (2010), the first 
double-blind clinical trial was not carried out until 1943. The trial 
investigated a treatment for the common cold. 


Activity 11 Blinding in James Lind’s scurvy experiment 


In James Lind’s 1747 experiment on scurvy, all patients were kept in the 
same space. James Lind hand picked his 12 patients, assigned their 
treatments and assessed their outcome. Though conditions on board a ship 
would have been difficult, how might it have been possible to achieve 
blinding in this experiment? 


All of the problems discussed so far in connection with the aspirin 
experiment relate to the question: 


How can you make a correct assessment of the effect of a drug on an 
individual? 
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This may seem at first to have little to do with statistics, but, as we shall 
shortly explain, statistics play an important part in helping doctors to 
analyse the results of clinical trials. Moreover, the kind of statistical 
analysis that is best suited to a particular experiment depends very much 
on the kind of control group that is used, on whether the doctor has taken 
objective or subjective measurements, and on other similar factors. 


Such factors (the nature of the control group, the kinds of measurement, 
etc.) are known collectively as the design of the experiment. One of 
the main purposes of this unit is to show that the design of an experiment 
and its statistical analysis go hand-in-hand. Before coming to this, 
however, one other major feature of drug-testing experiments has to be 
discussed: the variability of the people to whom the tests are administered. 





Hp You have now covered the material related to Screencast 1 for 
— Unit 11 (see the M140 website). 


2.3 Variability 


Clinical trials need both experimental and control groups of people. But 
how many people are needed in each group to assess the effect of a drug? 


Activity 12 An unsatisfactory experiment 


Suppose that two people suffering from headaches were available and you 
gave one of them aspirin and the other placebos. Why would this be an 
unsatisfactory experiment? 


Three sources of variability are readily distinguished. 


e Variability in the disease itself. 





e Variability in the response: even when patients are suffering from a 


Another unsatisfactory similar severity of disease, their response to the drug may vary. 


experiment ‘ i 
p e Inaccuracy in the method of measurement: as we saw in 


Subsection 2.2, subjective measurements of sensations like pain are 
necessarily imprecise, and even objective measurements are limited in 
their accuracy by imperfections in the measuring instruments and in 
the people who operate them. 


The relative importance of these factors will differ from trial to trial, but 
the total variability will very often be substantial compared to the effect of 
the drug itself. To illustrate this problem, we look at an example of blood 
glucose measurements. 
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Example 5 Blood glucose 


Blood glucose levels were measured throughout the day in 24 healthy 
volunteers under controlled conditions in a clinic. All volunteers ate the 
same breakfast at 7:30, the same lunch at 12:30 and their choice of dinner 
at 18:00. In this study (Christiansen, 2006), all subjects were monitored by 
two devices and an average measurement was taken. Figure 1 shows blood 
glucose measurements from the two separate devices (SCGM 1 and 

SCGM 2) for one volunteer. 
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Figure 1 Measurements of blood glucose on two devices 
(Source: Christiansen, 2006) 


Figure 1 shows variation in measurements between the two different 
devices, and illustrates that blood glucose peaks after meals. For instance, 
the blood glucose value after lunch at 12:30 is approximately double its 
value just before lunch. 


Activity 13 Measuring blood glucose 


Given that the blood glucose level can vary so widely within a single day, 
what precautions need to be taken when testing the effect of a drug on the 
blood glucose level? 


Where possible in a study, measurements should be taken with all study 
participants in a similar state, using the same instruments in the same lab. 
In Figure 2, blood glucose levels for 21 of the volunteers in the study in 
Example 5 are plotted over a 24-hour period. 
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Figure 2 Blood glucose levels of 21 healthy volunteers 
(Source: Christiansen, 2006) 


It is clear that there is enormous variation in blood glucose levels between 
these 21 healthy volunteers at the beginning of the day before any food is 
eaten, and also resulting from the effect of eating food. Thus there is wide 
variation in ‘normal’ blood glucose, so in studies of drugs to treat diabetes, 
researchers must try to decipher differences between drugs from variation 
between study participants. 





Often there are known reasons for differences in the physiology between 
individuals: for example, height. Factors known to influence an adult’s 
height include sex, ethnicity and nationality, and health and nutrition 
during development. These factors are called sources of variation, and 
when recorded, can be used to explain variation to some extent. 
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Example 6 Heart rates 


Consider a trial to assess the effect of a drug on a person’s heart rate. Like 
blood glucose, heart rates vary throughout the day. Although there are 
other sources of variation in heart rate, the main source of this variation is 
physical activity. Consequently, it is important that measurements on 
study participants should be taken at comparable levels of activity. 


Interest often focuses on maximum heart rate, even though this is often 
estimated rather than measured directly. Measuring the maximum rate can 
be a lengthy and difficult procedure and, moreover, it can be dangerous to 
subject elderly people with weak hearts to the strenuous exercise required. 
The estimates are made by measuring heart rate under mild exercise and 
using the result to predict what the maximum would be. Figure 3 shows 
the mean maximum heart rate of men and women at different ages. 
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Figure 3 Mean maximum heart rates of men and women at different ages 


(Source: Astrand and Christensen, 1964) 





Activity 14 Factors to control? 


If you wished to reduce the variability in the heart rate of study 
participants, would it be advisable to ensure that the experimental and 
control groups were of the same sex, or of the same age, or both? 


Figure 3 shows the mean maximum heart rate for men and women of 
various ages, but it does not show the spread of the maximum heart rates 
of people of the same age. Individuals differ quite widely in their 
maximum heart rate: individuals of the same age undertaking a similar 
activity can have somewhat different heart rates. (Heart rate increases 
with physical activity.) 


At each of the ages covered by Figure 3, the population standard deviation 
of maximum heart rate is about 10 beats per minute. This is a measure of 
the natural variation in heart rate between people. It is against the 
background of this natural variation that the effect of any drug on heart 
rate must be judged. Suppose that a new drug reduced the maximum heart 
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rate, on average, by 3 beats per minute. The drug would have to be tested 
on a very large sample of people before an effect as small as this could be 
distinguished from the natural variation in heart rate between people. 





Example 7 Propranolol and heart rate 


Beta-blockers are a group of drugs primarily used in the treatment of high 
blood pressure, angina pectoris and some other conditions of the heart and 
the blood’s circulatory system. Figure 4 shows how heart rate is affected 
by one beta-blocker, propranolol. 
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Figure 4 The effect of propranolol on heart rate 
(Source: Ekblom et al., 1972) 


The effect varies according to the vigour of people’s activity. Each orange 
dot denotes the mean of the heart rates of four people when they were 
given propranolol. Each black dot denotes the mean of the heart rates of 
the same four people when they received no drug. The vertical line 
through each dot denotes the spread of those four measurements. 


When the subjects (people) are resting, the effect of the drug is hardly 
detectable: the mean of the heart rates of the subjects after they have 
received the drug is slightly lower than that for the same subjects when 
they have not, but the spreads of the individual heart rates (indicated by 
the vertical line through each data point) is large enough to mask this 
effect. When the subjects are undertaking more vigorous activity, the 
effect of the drug becomes much more noticeable and, when they are 
engaged in extremely vigorous activity, the difference is very large and 
could easily be detected despite the spread of the individual heart rates 
also being larger than when the subjects are at rest. 





Propranolol 


Propranolol was the first successful beta-blocker developed. In 1988, 
Sir James Black (1924-2010) was awarded the Nobel Prize in 
Medicine for its discovery. Newer beta-blockers are now used to treat 
high blood pressure, but propranolol is still used to treat other 
conditions. 


Variability is the main reason why experimenters need to use statistics. 
Drug companies do not just want to know what happens to those patients 
who take a new drug in one particular clinical trial. They want to know 
the effect their drug would have on a much wider population: for example, 
the population of all people in the UK who might, now or in the future, 
suffer from a particular disease. They cannot test the drug on all the 
members of this population (some of them might not even have the disease 
yet!), so the drug is tested on a sample of people from the population, and 
the experimenters must make an inference from the results for this sample 
back to the population. 


If people did not vary at all, then the experimenters could find out all they 
needed to know by trying out their drug on just one person. They would 
have no need for statistics: a situation that, no doubt, they would greatly 
welcome! However, people vary greatly, and small effects are often 
important in medical contexts, so the need for statistics in drug-testing is 
paramount. 


Exercises on Section 2 





Exercise 1 Measurement scale 


Which of the following features can be measured only on a subjective scale 
and which can be measured on an objective scale? 


(a) Weight loss. 
(b) Appetite. 

(c) Sore throat. 
(d) Indigestion. 
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Exercise 2 Controlling factors in a double-blind trial 


Several volunteers agree to take part in a clinical trial of a new drug which, 
it is hoped, will relieve depression. The volunteers are divided into two 
groups whose compositions in terms of age and sex are as similar as 
possible. Double-blind trials of the drug are carried out. 


(a) What sources of variability between the experimental and control 
groups might not have been controlled by this procedure? 


(b) Why might it not be justifiable to expect that the results obtained 
from this clinical trial would apply to all people suffering from 
depression? 


(c) How might some of the problems that you have identified in parts (a) 
and (b) be overcome? 





Exercise 3 Sources of bias? 


In a large-scale clinical trial of a new anti-depression drug, several 
experimenters assess the drug’s effect on patients by interviewing them. 
Each experimenter uses a standardised questionnaire to assess the severity 
of a patient’s symptoms. What sources of bias in assessing the effect of the 
drug might there be in such a trial? 





3 Design of clinical trials 


Now that you have been introduced to some of the fundamental problems 
in designing and carrying out clinical trials (and other experiments), it is 
possible to go further and look at certain aspects of their design in more 
detail. Briefly, the two main requirements for a well-designed clinical trial 
are as follows. 


e It must eliminate all known forms of bias as far as possible. 


e It must be as sensitive as possible — that is, it must, without requiring 
a large number of patients or a large amount of time, have a good 
chance of accurately detecting any difference between the treatments 
being tested despite the variability of patients. 


3.1 Crossover design 


A simple way of eliminating a great deal of the variability in a trial is to 
give each person taking part both treatments. In this way, individuals act 
as their own controls. This procedure eliminates a lot of the variability 
that arises when different individuals act as experimental and control 
subjects. As an example, suppose that doctors wish to compare the effect 
of a beta-blocker with that of a placebo on blood pressure over a short 
period (e.g. two months). They could give each patient the placebo for 
eight weeks and then give them the active drug for eight weeks, measuring 
each patient’s blood pressure before they started and during the two 
treatment periods. Of course, they would want to eliminate the possibility 
of a placebo effect that was greater at the beginning than at the end of 
treatment, so they would give half of the patients the placebo first, 
followed by the active treatment, whereas the other half would get the 
active treatment first followed by the placebo (see Figure 5). A similar 


design could be used to compare two different beta-blockers (see Figure 6). 


This kind of design is called a crossover trial. (Strictly speaking, it is a 
two-period crossover trial.) A crossover trial is thus one in which, during 

the course of the experiment, each subject crosses over from receiving one 
treatment to receiving the other, or vice versa. 


random 
assignment Placebo 
So a) Go 


Figure 5 Crossover trial to compare a beta-blocker with a placebo 
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Figure 6 Crossover trial to compare two beta-blockers 


3 Design of clinical trials 


105 


Unit 11 Testing new drugs 


106 


Activity 15 Unusual assumption of a crossover trial 


What unusual assumption is made in the crossover trial design? Write 
down one case where this assumption is not valid. 


If doctors were comparing two antibiotics to treat a bacterial infection, 
then there would be no point in using a crossover trial because they would 
expect their patients, or at least some of them, to be cured by the 
treatment. Therefore the group of patients entering the second period, 
after receiving antibiotics in the first, will not be in the same condition as 
when they started the first period. Besides curing the patient, any effect of 
the drug that tends to last beyond the treatment (that is, any carry-over 
effect) will confuse the issue, so that drugs should usually be given for 
some time before measurements are made. 


Thus the crossover design is most suited to chronic conditions 
(conditions that are relatively long-lasting) such as high blood pressure, 
diabetes or arthritis, and is not well suited to conditions which are given to 
spontaneous ups and downs. Consequently the crossover trial is of rather 
limited application but, when it is suitable, in many cases it is the best 
design to use. 


Activity 16 Can a crossover design be used? 


State whether a crossover design would be appropriate for the following 
trials. 


(a) Type 1 diabetes is a chronic condition in which patients have high 
blood sugar. It cannot be cured but it can be controlled by giving 
insulin. However, insulin can cause hypoglycaemia, where blood sugar 
becomes too low. A trial is to be set up to compare two different 
regimens of insulin administration to control type 1 diabetes in 
patients. The outcome measurement is the number of hypoglycaemic 
episodes. 


(b) A trial on scurvy and two diet supplements, similar to James Lind’s 
experiment. Scurvy can be cured if vitamin C is included in the diet, 
but without this, scurvy will eventually result in death. The main 
outcome measurements are whether or not a patient is cured and the 
time it takes to be cured. 


You have now covered the material related to Screencast 2 for 
Unit 11 (see the M140 website). 


3 Design of clinical trials 


3.2 Matched-pairs design 


Even if it is impossible to give both treatments to each individual, it is 
often possible to pair individuals into matched pairs by identifying 
particular features of the individuals or of the disease from which they are 
suffering, that are likely to be important to the outcome of any treatment. 
The doctors can then give one treatment to one member of the pair and 
the other treatment to the other member (see Figure 7). 


random 
assignment 





matching 


Figure 7 Matched-pairs design with a placebo 


A trial that is designed in this way is called a matched-pairs trial. 
Examples of factors that might be matched are age, sex, severity of illness, 
length of time the patient has suffered from the illness, and the presence of 
other illnesses. Such a trial can be very useful but again has serious 
limitations. If a doctor finds an elderly lady suffering from a mild illness of 
one year’s duration, can another be found with whom to match her? How 
long will the doctor have to wait until a suitable match is found? Another 
problem is whether there is enough information about which factors really 
do affect the outcome of a treatment. If it does not matter whether the 
patient is a man or a woman, then there is no point in matching patients 
by sex. If, on the other hand, people of different blood groups respond Twins make ideal matched 
differently to the treatment, then failure to match patients by blood group Pars 

would reduce the usefulness of the clinical trial. 





A matched-pairs trial is more appropriate for fairly common, long-lasting 
disease for which special clinics exist, or lists of patients are available. 
Examples are diabetes and asthma. In this case, the patients can be 
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selected and matched (i.e. paired) before the trial starts. In diseases where 
patients turn up and need to be treated more or less at once, a 
matched-pairs trial is harder to organise. 


Activity 17 Organising matched pairs 

Organise the following eight patients into matched pairs. Match using the 
following order of importance: sex, smoking status, age. 

Male, smoker, aged 43, 

Female, non-smoker, aged 41, 

Male, non-smoker, aged 47, 

Female, non-smoker, aged 49, 

Male, smoker, aged 44, 

Male, smoker, aged 46, 


Male, non-smoker, aged 45, 





OO oS oh ie oe a a 


Male, smoker, aged 48. 


You have now covered the material related to Screencast 3 for 
Unit 11 (see the M140 website). 


e 


3.3 Group-comparative design 


Very often it is not possible to use either a crossover or a matched-pairs 
trial: the medical condition involved may not be long-lasting (this rules 
out a crossover trial) and patients cannot be neatly divided into pairs 
according to factors thought to influence response to treatment (this rules 
out a matched-pairs trial). There is then little choice but simply to divide 
the patients into the two groups and to give one treatment to the patients 
in one group and the other treatment to those in the other (Figure 8). The 
principle behind this division into groups should be to ensure that each 
group is representative of the population being studied, and that the 
allocation of treatments to patients is decided at random. 


For example, if gender and age were thought to influence a person’s 
response to treatment, then care would be taken so that the control group 
and the treatment group each contained similar male-to-female ratios and 
that each group had similar age profiles. Within these restrictions, patients 
would be allocated to the treatment group or control group at random. 
This is discussed further in the next subsection and forms the basis of the 
group-comparative trial. It can be used in a very wide variety of 
situations and it is the most commonly used design of clinical trial. 
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Figure 8 Group comparative design with a placebo 


| 
You have now covered the material related to Screencast 4 for E 
Unit 11 (see the M140 website). —_ 





3.4 Randomisation 


Each of the three designs of trial described in Subsections 3.1 to 3.3 
involves an allocation process in which some decision has to be taken 
concerning the patients involved in the trial. For example: 


e Ina crossover trial to compare a new drug with a placebo, you need to 
decide, for each patient, whether they receive the drug first and then 
the placebo or vice versa. 


e Ina matched-pairs trial to compare two drugs, after pairing the 
patients you need to decide, within each pair, which patient is given 
which drug. 


e Ina group-comparative trial to test a new drug, you need to decide, 
for each patient, whether they should be in the experimental group or 
the control group. 


We shall now explain the importance of using random methods in these 
allocation processes, using a group-comparative design. Similar reasoning 
can be applied to other designs of clinical trial and similar random methods 
of allocation can be devised. The use of such random methods is called 
randomisation: it is an important part of the design of clinical trials and 
of other experiments whose results are to be analysed using statistics. 


If individuals with a particular trait are more likely to be selected for the 
experimental group, while study participants of another kind are more 
likely to be selected to receive the control, whether this selection is 
conscious or not, any comparison of the two groups may be biased. This is 
known as selection bias, and the most important reason for 
randomisation is to eliminate this bias. The process of randomisation may 
also facilitate blinding the identity of treatments to the study investigators 
and participants. Finally, using random methods to allocate treatments 
allows the use of probability theory to express how likely it is that any 
difference in outcome between groups occurred by chance. 


As described briefly above, designing a group-comparative trial involves 
finding a procedure for allocating patients to the two groups. Choosing 
representative groups is a similar problem to that of choosing a 
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representative sample for a survey (covered in Unit 4). For a survey, the 
sampling method should produce a sample of the required size. Similarly, 
it would be bad in a clinical trial if a very large number of the patients 
ended up in one group and only a few in the other. 


Suppose that you wish to test a new drug against an existing drug. 
Sometimes, when the effects of the control treatment (the existing drug) 
are particularly well known, experimenters use a control group that is 
considerably smaller than the experimental group, but, on the whole, it 
seems reasonable to insist that the numbers of people in the two groups 
should be approximately equal. One way of achieving this would be to 
allocate alternate patients to the two groups (experimental and control). 
This method is similar to systematic random sampling (see Subsection 2.2 
of Unit 4) and is sometimes the best method. However, there is a danger 
that the doctors observing the patients might discover that this method 
had been used; for example, if the odd-numbered patients did not do so 
well as the even-numbered patients, or if alternate patients developed a 
particular side effect. The trial would not then be a double-blind trial. 


Activity 18 Avoiding patterns in treatment allocation 


How could the danger described above, that the doctors observing the 
patients discover the allocation to the two groups, be reduced? 


There are several ways in which a patient could be randomly allocated to 
the experimental or control group. Perhaps the simplest method is to toss 
a coin for each subject. If it comes down ‘heads’, then the patient is 
allocated to the experimental group; if ‘tails’, then the patient goes in the 
control group. In practice, a random number generator on a computer 
would be used in place of the coin. 





In Unit 4 (Subsection 1.3) you saw that random samples tend to be 
representative. In practice this means that, if the number of patients is 
large enough, then the use of random numbers to allocate them to 
experimental and control groups generally results in the two groups being 
fairly similar in all their features, including features that might not have 
occurred to the experimenter as being important. For instance, the ratio of 
males to females will be about the same in the two groups, as will the 
proportions of people of different ages, etc. However, often it makes sense 
to build balance into a trial design, and we will briefly explain why this is 
and how it is done. (You have already seen in Activity 10 (Subsection 2.2) 
the kind of problem that can occur if there is not balance.) 


If a trial is small, there is a risk that the control and treatment groups may 
have very uneven numbers. One solution to this would be to decide the 
number of participants in the study beforehand, so the number to be 
allocated to each group is known and these can be randomly ordered. This 
would be similar to putting the appropriate number of ‘A’s and ‘B’s into a 
bag and withdrawing the letters to assign the groups. 


A trial may be carried out over several different locations, such as different 
clinics, known as centres, or over several time periods, for example, spring 
admissions and winter admissions. It is preferable to balance numbers 
assigned to the treatment and control groups within each centre and/or 
time period. Similarly, the investigators may decide that it is important to 
balance allocation to groups for other important variables, such as age or 
disease progression. This is achieved using stratified randomisation, 
with strata defined for centre, time period, age or disease progression 
status as necessary. (Stratified sampling was described in Subsection 4.2 of 
Unit 4.) Each stratum is treated as if it has its own separate mini-trial, 
and randomisation for each stratum is carried out independently. This is 
only achievable with sufficient numbers of trial participants within each 
stratum. 


The following summarises the three different designs of clinical trial that 
have been considered in this section. 


e The crossover trial. Each person acts as his or her own control: 
during the course of the trial, each person crosses over from 
having one treatment to having the other, or vice versa. 


e The matched-pairs trial. Each person in one group is matched as 
closely as possible with a person in the other group. 


e The group-comparative trial. People are allocated randomly to 
two groups, usually in such a way that the two groups contain 
approximately the same number of people. 


The following box summarises their drawbacks and advantages. 
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e The crossover trial eliminates the variability that would arise 
from using different people in the experimental and control 
groups, but it cannot be used when the experimental treatment 
irreversibly alters a patient’s condition, nor is it suited to 
short-lasting diseases. 


e The matched-pairs trial eliminates much of the variability that 
arises from using different people as experimental and control 
subjects, but it can be difficult to achieve a good match between 
the experimental and control groups. 


e The group-comparative trial does not eliminate the variability 
that arises from using different individuals in the experimental 
and control groups, but it is relatively easy to set up. 


Exercises on Section 3 





Exercise 4 What type of trial is being used? 


State which of the three designs of trial is being used in each of the 
following experiments: a crossover, a matched-pairs or a 
group-comparative design. 


(a) Ten pairs of identical twins are found. The experimenter allocates one 
of each pair of twins at random to the experimental group and one to 
the control group. He administers a new drug to the experimental 
group and a placebo to the control group. 


(b) Eighty people suffering from arthritis are divided randomly into two 
groups. One group receives a new drug, the other a placebo. 


(c) Another eighty people suffering from arthritis are divided randomly 
into two groups. One group receives a new drug for three months and 
then a placebo for another three months. The other group receives the 
placebo for three months and then the drug for three months. 





Exercise 5 What type of trial should be used? 


Of the three designs of clinical trial, which would be most appropriate to 
use in each of the following tests? 


(a) To test a drug which alleviates an unpleasant and long-lasting 
symptom of a disease (e.g. pain) without curing the disease. 


(b) To test a drug which improves a condition that is rare and requires 
immediate treatment. 





(c) To test a drug which helps people to give up smoking. 


“Your doctor will be here in 





a minute, l'm a placebo.’ 


112 


4 Analysing the data 


In this section we shall describe the type of analysis commonly used for 
data from clinical trials like those in the last section. This will include 
more general points concerning the collection and analysis of data from 
scientific experiments. 


4.1 What analysis? 


Imagine that a research laboratory has completed a clinical trial of a new 
drug. The researchers decided the design for their experiment, selected the 
patients and used the appropriate random allocation process. They then 
administered the appropriate treatments (one experimental and one 
control) to the correct patients at the correct time and measured the 
effects, i.e. they collected the data. They now want to investigate whether 
the experimental treatment really differs from the control treatment in its 
effect on patients. How should they analyse this data? 


It would be appropriate to use hypothesis testing. Units 6-10 described 
several statistical tests that can be used to investigate whether a 
hypothesis like this is tenable. 


Activity 19 The null and alternative hypotheses 


Suppose that a drug company wishes to compare the effect of a new drug 
on headaches with that of an existing drug. In the terminology of Unit 7 
(Subsection 1.3), what would be the null hypothesis, and against what 
alternative hypothesis should the null hypothesis be tested? 


Activity 20 A one- or two-sided test? 


Section 6 of Unit 10 discussed one-sided and two-sided hypothesis tests. 
Would a one-sided or a two-sided test be more appropriate to test the 
hypotheses given in Activity 19? 


You might have thought that a one-sided test would be appropriate in 
Activity 20, on the grounds that the drug company would not be 
interested in testing a drug unless they were fairly certain that it could not 
be less effective than an existing treatment. However, even if the company 
do think that, the results of their experiment have to be made available to 
the European Medicines Agency (EMA) for scrutiny, and the EMA do not 
wish to rule out the possibility that the new drug is worse than the old by 
using a one-sided test. 
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4.2 Hypothesis testing 


You have already met several hypothesis tests, and there are many others 
available. In analysing data from experiments like those described in 
Section 3, each test does roughly the same job. The experimenter sets up a 
null hypothesis of no difference (in median or mean) between two 
populations. The data consist of measurements made on samples from 
each of the two populations. The outcome of the hypothesis test is that 
the null hypothesis either is or is not rejected, on the basis of a test 
statistic calculated from the observed data from the two samples. 

Section 4 of Unit 6 and Subsection 1.3 of Unit 10 describe the process of 
hypothesis testing, which is summarised in Figure 9. 








Set up 
HYPOTHESIS 







Find value of 
TEST STATISTIC 







Look up 
CRITICAL VALUE 








COMPARE 
test statistic with 
critical value 





Do not reject 
null hypothesis 


Figure 9 Steps in a hypothesis test 


To use a test, the experimenter might decide the significance level, which is 
the probability of incorrectly rejecting the null hypothesis when it is 
actually true. Alternatively, the p-value (significance probability) given by 


the test might be used to evaluate the strength of evidence against the null 
hypothesis. 


In the present context, if the clinical trial produces two batches of 
measurements, one on the effect of the new drug on headache pain and the 
other on the effect of the existing drug, then a hypothesis test at a given 
significance level will help to decide whether the two drugs differ in their 
effects when they are applied to the population from which the samples 
were chosen. 


It is important to remember that what the doctor, scientist, or anyone else 
decides to do with the result of a hypothesis test is a matter of human 
judgement, not statistics. One factor influencing this decision should be 
the significance level of the test, so the user of the test must decide what 
significance level to use. In science and medicine there is a strong 
convention of using a 5% significance level, but there is nothing sacred 
about 5%. There may be occasions when it is sensible to use a different 
significance level: for example, 1% or (occasionally) 10%. 


The following table was given in Section 5 of Unit 6 for the interpretation 

of p-values. Although the interpretation should inform subsequent decision 
making, deciding future actions is more complex than simply examining a 

p-value. 


Table 1 Interpretation of p-values 


p-value Rough interpretation 


p > 0.10 Little evidence against the hypothesis 

0.10 > p > 0.05 Weak evidence against the hypothesis 

0.05 > p > 0.01 Moderate evidence against the hypothesis 
0.01 > p > 0.001 Strong evidence against the hypothesis 
0.001 > p Very strong evidence against the hypothesis 


The critical region is smaller for a hypothesis test at the 1% significance 
level than at the 5% significance level — if the null hypothesis is rejected at 
the 1% level then it will also be rejected at the 5% significance level. This 
has the following obvious consequences. (See Subsection 5.2 of Unit 8, on 
errors.) 


e The probability of a type 1 error is less if a 1% significance level is 
used, than if a 5% significance level is used; that is, there is a smaller 
chance of incorrectly rejecting the null hypothesis when it is really 
true. 


e Other things being equal, the probability of a type 2 error will be 
greater using a 1% significance level than using a 5% significance level; 
that is, using a 1% significance level you are more likely to accept the 
null hypothesis when it is really false. 


Overall then, a test using a 1% significance level is more cautious about 
rejecting the null hypothesis than is the same test using a 5% significance 
level. When deciding the significance level to use in a particular trial 
design, doctors and other scientists have to take into account all the 
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Influenza virus 


4. Interpret 
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medical, ethical, social and financial implications of the clinical trials that 
they are planning and of the decisions that they have to make. 


Table 2 Type 1 and type 2 errors 


Ho true Ho false 
Ho not rejected Correct Type 2 error 
Ho rejected Type 1 error Correct 


All the hypothesis tests that were introduced in Units 6-10 involved the 
assumption that the data came from samples chosen at random from the 
populations of interest. In drug-testing, although random methods are 
used (e.g. to allocate patients to the experimental and control groups), it is 
not common to use the kind of random sampling methods described in 
Unit 4 to choose the patients who are going to take part in the trial. 
Indeed, it is usually impossible to do so. 


Example 8 Influenza 


Suppose that a drug company wants to know about the effect of a new 
drug on the symptoms of influenza in people living in the UK. The 
company could select two random samples of people from the population 
of the UK, giving the people in one sample the new drug and those in the 
other sample an existing treatment. However, unless everyone in both 
samples already had influenza, there would be nothing to measure. Since it 
would not be considered ethically acceptable to infect the people in the 
samples with influenza, this approach will not work. 


The experimenters must choose the groups for this trial from people who 
already have influenza, and proceed as if these people are a random sample 
from the population of people in the UK who might have caught influenza. 
This should result in sound inferences back to this population, because the 
groups will usually form samples that are representative of the population 
even if they are not strictly random samples. 





In practice, experimenters are often not very precise in specifying exactly 
to which populations their results refer. This is reflected in a common, but 
misleading, piece of jargon which you may well have heard. An 
experimenter might say casually that the experimental and control groups 
in his experiment differed significantly. By this they probably mean the 
following. 


‘I have carried out a hypothesis test, using the null hypothesis that the 
populations from which my two groups were chosen did not differ in 
their means, medians (or some other measure). The result of the 
hypothesis test was that the null hypothesis of no difference was 
rejected.’ 


Thus, to say that two samples differ significantly is to infer something 
about the populations from which those samples were drawn. Another 
point to remember is that a difference which is significant in this sense 
may be of no practical significance whatsoever. 


You may also see the following type of sentence in a report of a scientific 
experiment. 


‘The two groups differed significantly (p < 0.05).’ 


This has a similar interpretation, but gives the significance level: there is 
evidence that the groups come from populations whose means are not 
identical, on the basis of a hypothesis test using a significance level of 5%. 


Although there are many hypothesis tests available, this does not mean 
that researchers can choose any test they like, because each test requires 
certain conditions to be fulfilled before it can be used, and these conditions 
vary from one test to another. In particular, when choosing a test it is 
important to ask the following questions. 


e What type of data has the experiment produced? 


e Does the design of the experiment involve matching (i.e. pairing) the 
subjects in the two groups? 


We shall now consider both questions in the following two subsections. In 
Subsection 4.3 we will consider the types of data. Then in Subsection 4.4 
we will consider how the design of the experiment affects the choice of test. 


4.3 Types of data 


In earlier parts of the module you have met different kinds of data. First, 
there are categorical (or nominal) data. This form of data arises when 
there are several mutually exclusive categories and each item or person 
belongs in exactly one of the categories. Here, the data do no more than 
identify the category each item or person belongs in. For example, you 
could ask parents what medicine they used when their child last had a 
temperature, and record their answers as paracetamol, ibuprofen, none or 
other. There is very little quantitative information contained in the 
response of a single parent. You cannot say that a paracetamol-containing 
medicine is quantitatively different (e.g. larger or smaller) from an 
ibuprofen-containing medicine. You can, however, count the number of 
parents who fall into each category. 
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Various hypothesis tests are available that can be applied to categorical 
data: the tests introduced in this module are the sign test (Sections 4 

and 5 of Unit 6) and the y? test (Section 4 of Unit 8). The sign test could 
be used to examine, say, whether paracetamol or ibuprofen was preferred 
by more parents. The x? test could be used to examine, for example, 
whether the medicine preference for children in one group of parents 
differed significantly from another group of parents. 


Another type of data, called ordinal data, contains more quantitative 
information than categorical data. A Likert scale, introduced in Unit 4 
(Subsection 3.1), is an example of a measure that gives ordinal data. 

A researcher asks patients to rate their sleep quality on a scale from 1 to 4, 
with 4 signifying that their sleep quality is very good, 3 fairly good, 

2 fairly bad and 1 that it is very bad. Here one can say that a rating of 4, 
for example, is quantitatively different from a rating of 2. It is obviously 
higher. Data on an ordinal scale like this can therefore be ordered, but it is 
not really possible to do anything more ambitious with them than that. It 
would not really be possible to say that a person who rated their sleep as 3 
had three times the quality of sleep than a person who rated it 1! It would 
not even be possible to say that the difference between ratings of 3 and 4 
was the same as the difference between ratings of 1 and 2; to be precise, it 
does not make sense to subtract these ratings. 


There are specific hypothesis tests available for ordinal data, though they 
are not taught in this module. It is possible to analyse ordinal data as 
nominal data, though information about ordering is lost. 


Measurements on an interval scale (interval scale data) contain still 
more quantitative information. Data of this kind are what you might think 
of as real measurements, since they are actual quantities in definite units, 
such as height in centimetres, age in years, heart rate in beats per minute 
and so on. Here it does make sense to say that, for instance, the difference 
between two people’s heights of 147 cm and 151 cm is the same as the 
difference between 185cm and 189 cm. 


Again, there are statistical techniques which are suitable for testing 
hypotheses about interval scale data. These include z-tests, described in 
Unit 7, and t-tests, described in Unit 10. 


Examples of all three types of data occur in medicine. 


Activity 21 Types of data 

For each of the following, say whether the recorded information gives 
categorical, ordinal or interval scale data. 

(a) Body temperature (in °C). 

(b) Pain, rated on a four-point scale from mild to severe. 


(c) A woman of childbearing age’s reproductive state: pregnant or not 
pregnant. 


(d) Systolic blood pressure (in mmHg). 


You have now covered the material related to Screencast 5 for 
Unit 11 (see the M140 website). 


4.4 Which test? 


Nominal data that can be easily categorised and placed in a contingency 
table arise frequently from clinical trials. For example, the number of test 
subjects meeting some criteria that determines whether a drug is successful 
can easily be tabulated. In Unit 8, you met contingency tables and were 
shown how to apply the x? test. The x? test can also be applied to clinical 
trial data. 


Interval scale data from clinical trials are quite commonly analysed using 
z-tests or t-tests. A number of such tests have been described in Units 7 
and 10: 


e The two-sample z-test and two-sample t-test, which are used to test 
the null hypothesis that the means of two populations are equal. 


e The one-sample z-test and one-sample t-test, which are used to test 
the null hypothesis that a population mean equals some specified 
value. 


e The matched-pairs z-test and matched-pairs t-test, which are used to 
test the null hypothesis that the mean difference between two 
responses is equal to zero. (Recall from Subsection 4.2 of Unit 10 that 
the matched-pairs z-test and matched-pairs t-test are just the 
corresponding one-sample tests performed on the differences within 
pairs.) 


The appropriate z- or t-test to use will depend on the number of samples, 
the sample size or sample sizes, whether or not the data are matched, and 
what assumptions are satisfied. These issues have been discussed in 

Units 7 and 10 and will be returned to in Unit 12. Here we will only 
mention again the relationship between matching and clinical trial design. 


Section 3 described the advantages and difficulties of using both a 
crossover trial and a matched-pairs trial. When each subject acts as his or 
her own control, or when each can be paired with a control subject who is 
similar in those features known to be important to the experiment, then it 
is possible to eliminate a lot of unwanted variability from the experiment. 
Both of these trial designs produce data which are in the form of matched 
pairs. In a crossover trial, there are two measurements for each person: 
one for each treatment. These are clearly paired. With a matched-pairs 
trial, the data from each person are paired with the data from the other 
half of the matched pair. 


The following summarises the above discussion, regarding choice of 
statistical test. 
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Choice of hypothesis test, based on data and design 


Type of data Design Test 

Categorical Ne 

Interval scale Group comparative two-sample z- or t-test 
Interval scale Matched pairs matched-pairs z- or t-test 
Interval scale Crossover matched-pairs z- or t-test 


Exercises on Section 4 


Exercises 6 and 7 will refresh your memory of the y? test that you met in 
Unit 8 and the t-test that you met in Unit 10. They illustrate how these 
tests can be applied to data from clinical trials. For convenience, Table 28 
from Unit 8 and Table 2 of Unit 10 are reproduced below. (Versions of 
these tables are also given in the Handbook.) These give critical values for 
these tests. 


Table 3 Table of critical values of y? 


Degrees of Critical values of x? 
freedom at significance level 


5% 1% 
1 3.841 6.635 
2 5.991 9.210 
3 7.815 11.345 
4 9.488 13.277 
5 11.070 15.086 
6 12.592 16.812 
7 14.067 18.475 
8 15.507 20.090 
9 16.919 21.666 
10 18.307 23.209 
11 19.675 24.725 
12 21.026 26.217 


4 Analysing the data 


Table 4 5% critical values for a two-sided Student’s t-test 


Degrees Critical value Degrees Critical value 





of freedom (te) of freedom (te) 
1 12.706 21 2.080 
2 4.303 22 2.074 
3 3.182 23 2.069 
4 2.776 24 2.064 
5 2.571 25 2.060 
6 2.447 26 2.056 
7 2.365 27 2.052 
8 2.306 28 2.048 
9 2.262 29 2.045 
10 2.228 30 2.042 
11 2.201 31 2.040 
12 2.179 32 2.037 
13 2.160 33 2.035 
14 2.145 34 2.032 
15 2.131 35 2.030 
16 2.120 36 2.028 
17 2:110 37 2.026 
18 2.101 38 2.024 
19 2.093 39 2.023 
20 2.086 40 2.021 
Exercise 6 Testing a new contraceptive pill +a 


A drug company wished to test the efficacy of a new oral contraceptive by 
trying it out on volunteers. So 2000 volunteers were allocated randomly to 
two groups: the experimental group of 1000 women took the new 
contraceptive whilst the control group of 1000 women took an existing 
contraceptive. At the end of a one-year period, each woman taking part in 
the test was recorded as either having conceived or not. Suppose that only 
one of the 1000 women taking the new contraceptive had conceived by the 
end of the year, whereas 15 out of the 1000 women taking the existing 
contraceptive had conceived. 


a) What type of data is involved? 
b) Tabulate the data in an appropriate contingency table. 


( 
( 
(c) Write down the appropriate null and alternative hypotheses. 
(d) Carry out the test. 

( 


e) What do you conclude? 
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Exercise 7 A drug for altering heart rate 


Eight people took part in a crossover trial whose purpose was to discover 

- whether a new drug alters people’s heart rate. The trial was a double-blind 
trial and the control treatment was to give each person a placebo. The 
data from this trial are in Table 5. 


Table 5 Heart rates of eight participants in a crossover trial 


Heart rate (beats per minute) 


Subject Drug Placebo 
1 90 105 
2 82 88 
3 95 90 
4 80 89 
5 88 80 
6 75 110 
7 83 84 
8 90 100 


(a) What type of data is involved? 
(b) What hypothesis test is appropriate for these data? 


(c) What distributional assumptions are necessary in order to use this 
test? 


(d) Write down the appropriate null and alternative hypotheses. 


(e) Carry out the test. What do you conclude? Does the new drug alter 
people’s heart rate? 





5 Drugs in society 


Much of the discussion in this unit so far has been about the effectiveness 
of a drug. Another, equally important, aspect of drug-testing is assessing 
the unwanted effects of drugs. In this section we shall describe some ways 
in which these are discovered, and look at the problems involved in 
ensuring that drugs are as safe as possible. 


122 


5.1 Side effects 


The unwanted effects of drugs, side effects, can be of a number of 
different kinds. Here are some examples. 


e <A drug may make people feel drowsy, suffer from hallucinations or 
have headaches. Such effects may be mild or serious. Even if they are 
only mild, however, they may cause patients to stop taking the drug 
and thus reduce its usefulness, especially if the disease itself is not 
very serious. 


e A drug may interfere with the function of particular organs in the 
body. It might speed up the heart, damage the liver or cause birth 
deformities. Many of the serious hazards of drugs are of this kind and 
may be permanent, but effects of this kind may also be milder. 





‘Listen, when the side effects of this medication 
Rick tw, you'll forget what was wrong 
tw the first place!” 


Certain unwanted effects of drugs are common. For example, many people 
taking the pain-reliever codeine may become constipated. Other effects are 
rare. Clearly a non-serious, uncommon effect is of little importance. It is 
not at all worrying if, one in a thousand times, a drug causes a headache. 
But if a drug causes serious liver damage one in a thousand times, then it 
is most probably not a useful drug. The question of how important an 
unwanted effect is depends not only on the severity of the effect but also 
on the severity of the disease being treated. Severe liver damage as a 
consequence of a treatment for a headache would be unacceptable, whereas 
if a drug company developed a cure for rabies (which is almost invariably 
fatal once its symptoms become apparent), then even a treatment with a 
side effect which killed one in ten of the recipients would be considered a 
major advance. Whether or not the benefits of a new drug outweigh the 
risks is precisely what the experts at the EMA aim to establish in order to 
make their decision on a new drug approval. (Liver damage has indeed 
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been linked to the over-the-counter pain-relief drug paracetamol, but only 
after overdose or constant long-term use.) 


Sometimes investigators in clinical trials have a pretty good idea of what 
side effects they expect from a new drug. There may be suggestions from 
preliminary experiments on animals that a particular effect may be a 
problem; other related drugs may show such an unwanted effect; or the 
standard treatment may have an undesirable side effect which, it is hoped, 
will not occur with the new treatment. In all these cases, it is necessary to 
devise a method of measuring the effect in question. 


This process is similar to that described in Subsection 2.2 for measuring 
the desired effects of drugs. 


Sometimes investigators have some idea of the effects for which they are 
looking, but do not know them exactly. It is well known that quite a lot of 
drugs have effects, such as headache, constipation and tremor symptoms, 
which are not uncommon in the population as a whole, whether or not 
drugs have been taken. In many clinical trials the severity of such 
symptoms is entered on a checklist of symptoms which, it is known from 
past experience, are often connected with drug treatment. An example 
checklist is given below. 


Symptom checklist 


Symptom Noticed in past Severity of symptoms 
two weeks? on 0-3 scale* 


Heart pounding 
Sleepiness 

Headache 

Sweating 

Dizziness 

Trembling 

Blurred vision 

Eye strain 

Difficulty passing urine 
Constipation 
Diarrhoea 

Nausea 

Vomiting 

Indigestion 

Lack of appetite 
Funny taste in mouth 
Dry mouth 

Difficulty in breathing 
Rash 

Itchiness 


* 0 = absent, 1 = mild, 2 = moderate, 3 = severe 





‘| didw't experience any of the side effects 
listed tw the enclosed Literature. 
Should I be concerned?’ 


Very occasionally in clinical trials, serious unexpected adverse events occur. 
(An adverse event is a side effect that has a negative effect on a patient.) 
These are events such as death, life-threatening events, hospitalisation, 
birth defects and disablement. If it is probable that the unexpected adverse 
event was caused by the drug being tested, then investigators must report 
the event to the authorities. Throughout the EU, these are entered into a 
single electronic system called EudraVigilance. Investigators will consider 
whether the trial should be stopped. During the earlier phases of testing, 
investigators will stop the trial immediately if adverse reactions are severe. 





Example 9 Phase 1 trial of TGN1412 


An extreme and exceptional case occurred during 2006, when the very first 
phase 1 trial of a new drug, TGN1412, designed to target the immune 
system, had to be stopped. The drug was administered in doses 500 times 
lower than that found to be safe in animals. However, soon after the trial 
started, six volunteers were hospitalised: four had multiple organ failure 
and all six experienced cytokine release syndrome, which caused severe 
inflamation of the skin. Fortunately all the volunteers survived, the last 
being released from hospital after three months, though one had to have 
fingers and toes amputated and it was reported that they all remain at a 
long-term increased risk of developing an immune-system-related illness. 
The incident was put down to ‘unpredicted biological action in humans’. 
(See Suntharalingam et al., 2006 and Expert Scientific Group, 2006.) 
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Example 10 Phase 2 trial of fialuridine 


In 1993, five patients died during a phase 2 trial of the new anti-viral drug 
fialuridine, designed to target hepatitis B. No sign of toxicity was detected 
during phase 1 trials, in which 67 subjects received fialuridine for two or 
four weeks. Then in the thirteenth week of the phase 2 trial, one patient 
suddenly developed hepatic toxicity (liver damage by chemical). The trial 
was stopped. Even after stopping the drug, seven patients went on to 
develop hepatic toxicity; five died and two survived after liver transplants. 
This severe delayed reaction could not have been predicted; fialuridine had 
gradually accumulated in liver DNA. (See McKenzie et al., 1995.) Any 
similar new drug would now be tested on animals for a longer period, to 
check for this type of damage. 





5.2 Unexpected side effects post-licence 


A completely different problem is presented by those side effects that are 
not discovered until after a drug has been licensed. A well-publicised 
example of this was the drug thalidomide (see Thalidomide Society, 2006). 
This was a sedative introduced in the UK in 1958. When taken by women 
in early pregnancy, it sometimes produced very severe effects on the 
physical development of the unborn children. A more recent case was the 
drug efalizumab, used to treat chronic psoriasis (red, scaly patches on the 
skin), which was withdrawn in 2009 after some reports on fatal infections 
in long-term users; we shall return to this example later. Yet another 
historical example was the drug practolol, a beta-blocker, which shall now 
be described in more detail. 





Example 11 Withdrawal of practolol 


Practolol (marketed as Eraldin) was introduced in 1970 as a beta-blocker 
for the management of heart conditions. The particular advantage of 
practolol was that it appeared to have fewer unwanted effects than other 
beta-blockers available at that time. The first indications that the drug 
was not, after all, safe, came in two letters published in the British Medical 
Journal in 1974. The first, by Felix and Ive (1974), described a 
characteristic rash that had developed in fourteen patients who had 
received long-term practolol therapy. The second letter, by Wright (1974), 
also reported a rash, but a few of his patients had also developed 
conjunctivitis (inflammation of the membrane that covers the front of the 
eye) — leading to severe and permanent visual impairment. Practolol 
seemed to be the common feature in all these cases. 


The company that marketed this drug, Imperial Chemical Industries (ICT), 
sent a letter to all doctors and pharmacists in the UK warning them of 
these possible side effects, requesting information on similar cases, and 
advising immediate cessation of practolol therapy in any patient 
developing such symptoms. 


As more data were gathered, it became clear that the adverse reaction was 
definitely associated with practolol. It also appeared that similar reactions 
did not occur with other beta-blockers. Practolol was finally withdrawn 
from the market in October 1975. By 1981, there was a total of about 
2450 reports from doctors about reactions to the drug, including 

40 deaths, 1130 cases of eye damage and 1250 skin reactions. (See also 
Abraham and Davis, 2006.) 





It is important, when judging the number of reported eye reactions in 
Example 11, to consider the amount of practolol that was used during the 
period. This type of amount is usually measured in patient-years. 


Patient-years 


If a single patient takes a drug for one year, then that constitutes one 
patient-year of use of the drug. 


If two patients take the drug for six months each, then the usage is 
half a patient-year each, which again comes to one patient-year. 
Assuming that the patients take the same dose of the drug each day, 
the amount of drug consumed by one patient taking it for a year is 
the same as the amount consumed by two patients over six months, so 
a patient-year corresponds to the use of a certain amount of the drug. 


If twelve patients each take the drug for a month, or if 365 patients 
each take it for a day, the total usage is still one patient-year. 


The total amount of practolol prescribed during the five years of its use 
amounted to approximately a million patient-years. The question that 
must be answered is not 


How can such occurrences with new drugs be prevented? 
but 
How can new drugs that have serious side effects be identified quickly? 


It is perhaps necessary first to justify the implication that such occurrences 
cannot be prevented (except by prohibiting all new drugs). A major factor 
is the length of time during which patients have to use a drug before the 
symptoms became apparent. With the exception of certain special cases, 
such as clinical trials of treatments for cancer, few clinical trials last for 
longer than a year. It is only when drugs are marketed that they are used 
for longer periods. One could argue that drugs should be tested in clinical 
trials for as long as it is proposed to use them in treatment. This would 
cause great delays in introducing drugs, but for some drugs this probably 
does not matter. Practolol, however, was a significant advance for many 
patients, because it provided genuine relief for people suffering from 
potentially fatal heart conditions. Few would wish to delay a drug that 
showed a good prospect of saving lives on the grounds of unlikely, although 
untested, possibilities of unknown risks. 
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Another obvious factor is simply the rarity of the effect. If a drug causes 
severe damage to only one in a thousand of the people who take it, then 
quite a lot of people may have to take it before even the first case will be 
seen, and even more must take it before it can be known with any 
certainty that it is the drug that is causing the effect. 





Example 12 Withdrawal of efalizumab 


Efalizumab (marketed as Raptiva) was approved by the EMA in 2004 for 
the treatment of chronic psoriasis, a disease that attacks the immune 
system and causes red, scaly patches on the skin. Short-term clinical trials 
showed efalizumab to be safe, though there was a question over the risk of 
infection with long-term use. 


By the end of 2008, the drug manufacturers had made the EMA aware of a 
number of cases of serious infections in long-term users, most notably four 
long-term users who had developed a fatal brain infection (progressive 
multifocal leukoencephalopathy), three of whom had died. 


In February 2009 the CHMP met to review the risks and benefits of 
efalizumab; its benefits in the treatment of psoriasis were only modest, 
whereas there was a risk of serious side effects. The CHMP recommended 
suspension of the drug’s licence unless a subgroup of patients could be 
identified in which the benefits outweighed the risks. In May the drug 
licence holders voluntarily withdrew efalizumab, and in June the EMA 
withdrew the licence (EMA, 2009; see also DeFrancesco, 2009, Seminara 
and Gelfand, 2010). 





There are other factors that can easily prevent a relatively rare side effect 
being quickly detected. One is that not all the cases that occur are notified 
to the workers carrying out the follow-up; another is that the patients 
involved are often using more than one drug. 


Although it is vital to carry out thorough, well-designed clinical trials of 
drugs, and although statistical techniques can help to decide whether or 
not the drug is useful, these trials and statistical techniques cannot, by 
themselves, guarantee that a drug will be trouble-free when it is marketed. 
The greater the discrepancy between the conditions in which the clinical 
trials of a drug are conducted and the conditions in which the drug is 
actually used, the greater the chance that unpredicted side effects of the 
drug will appear. When testing a drug, the statistical analysis of data from 
clinical trials helps a drug company to decide whether to apply for a 
product licence to market that drug, but further data needs to be collected 
after marketing the drug. In order to monitor how the drug actually 
performs once it has been marketed, some form of post-marketing 
surveillance is needed. 
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5.3 Post-marketing surveillance 


The pharmaceutical industry and national regulators throughout the EU 
submit all adverse drug reaction reports to EudraVigilance (the central 
electronic system). This includes both individual case safety reports, which 
are submitted if an individual patient has a serious adverse reaction to a 
licensed drugs in general use, and unexpected serious adverse reactions 
that occur during clinical trials of unlicensed drugs. The aim of such a 
system is to allow any potential drug safety issue to be detected and 
investigated as early as possible. 





Where necessary, these drug safety signals are referred to the EMA. 
Within the EMA there is a separate committee who look at all aspects of 
drug safety, called the Pharmacovigilance Risk Assessment Committee 
(PRAC). Their responsibilities include scrutinising referrals from the 
EudraVigilance system, monitoring the system’s effectiveness and 
maintaining a list of drugs that should be subject to additional monitoring. 
PRAC will make recommendations to the CHMP and other committees on 
whether a drug’s licence should be changed or withdrawn. 


The trouble with a method like this is that doctors will not, generally, 
suspect an adverse drug reaction when they see something unfamiliar but 
will, quite rightly, send the patient to the appropriate specialist. So the 
reports submitted to EudraVigilance will tend to be biased towards the 
symptoms that doctors expect to see as adverse reactions. So a less 
expected reaction is likely to be missed until a sharp-witted specialist, who 
sees enough cases, notices that more patients suffering from a particular 
condition are taking a particular drug than would be expected. In the case 
of the connection of both rash and conjunctivitis with practolol, there were 
distinct, unfamiliar features to alert a specialist that something unusual 
was happening. The situation is much more difficult when a drug causes a 
disorder that is common in the whole population. 


How can the situation be improved? All, or a sample, of the patients 
taking the drug for a period after the drug comes onto the market can 
be monitored. Such studies are sometimes called phase 4 trials, and it 
is a further responsibility of PRAC to assess and evaluate these 
population-based studies. 





Example 13 Withdrawal of rofecoxib 


Rofecoxib (marketed as Vioxx) is a drug that was used to target arthritis 
and other pain-causing conditions. It was withdrawn after 
population-based studies had shown an increased risk of heart attack and 
stroke. 


The drug came onto the market in 1999. A study was published in 2000, 
comparing the effectiveness and side effects of rofecoxib and another drug 
used to treat arthritis, naproxen (Bombardier et al., 2000). It found that 
the incidence of heart attacks was four-fold higher in the rofecoxib group. 
Rofecoxib’s manufacturer, Merck Sharp & Dohme, responded by claiming 
that the difference was due to naproxen having a protective effect on heart 
attacks. 
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In 2000, Merck commenced a three-year study whose primary aim was to 
assess the effectiveness of rofecoxib for a new purpose, but it had the 
additional aim of assessing cardiovascular risk. It was found (Bresalier et 
al., 2005) that the rate of a serious cardiovascular event such as a heart 
attack or stroke, after 18 months on rofecoxib, was 1.71 events per 

100 patient-years versus 0.38 events per 100 patient-years for study 
participants taking a placebo. The drug was voluntarily withdrawn from 
the worldwide market by Merck in 2004. (See also Dieppe et al., 2004.) 





How are population-based studies carried out? One method is to select a 
group of patients taking the drug, and to follow them up by means of 
regular medical checks, interviews and records of hospital admissions. Such 
a group is called a cohort. A control group (cohort) may also be selected, 
which is matched with this group in as many respects as possible, but 
those in the control cohort are not taking the drug. 


Activity 22 Selecting groups for post-marketing studies 


The procedure for selecting cohort groups in post-marketing studies is 
similar to the selection of groups for a clinical trial. Is the selection 
procedure identical to that used in a clinical trial? If not, in what 
important ways is it different? 


Selecting experimental and control groups after, rather than before, they 
have received a treatment is unsatisfactory because many factors other 
than the drug treatment will be different for the two groups, though 
investigators will try to take these factors into account. This is also a very 
expensive method of surveillance, and studies may take a long time to 
carry out; for example, it took four years before rofecoxib was withdrawn. 
Depending on the size of the population monitored, rare side effects may 
still go undetected. Other study designs are available that can be used 
more economically to study rare side effects. 


Another approach, called record linkage, consists of obtaining access to a 
patient’s medical record so that links, other than those already known, 
between various ailments and the drugs being taken can be spotted. This 
approach does not require doctors to actively report information on their 
patients, but only requires them to permit researchers to have access to 
the records. For this reason it has many attractions, but it also raises the 
highly controversial issue of the privacy of medical records. 


None of these methods can completely prevent damage to patients from 
the unexpected actions of drugs. All they can do is reduce the time it 
takes to spot the effect so that the drug can be withdrawn, or its use 
limited, as soon as possible. 


6 Case study 


6 Case study 


The following case study describes the progress of a particular drug 
through the different phases of test. Do not worry if you do not follow all 
the medical details. 


Patients who have just had a hip or knee replacement are at high risk of 
blood clots forming in the veins of their legs, which can be dangerous if the 
clot moves to a major organ such as the lungs. Likewise, patients with an 
abnormal heartbeat, called ‘atrial fibrillation’, have a high risk of blood 
clots which can cause a stroke. Patients at high risk of blood clots are 
prescribed anticoagulant drugs, which prevent blood from clotting. The 
main safety concern with any drug that prevents blood from clotting is 
that patients may bleed excessively. Clinical testing of any new 
anticoagulant needs to establish an acceptable trade-off between blood clot 
prevention and bleeding. 


Dabigatran (marketed as Pradaxa) is one such anticoagulant. Dabigatran 
was granted marketing authorisation by the European Commission in 2008 
after positive evaluation by the CHMP (EMA, 2008, 2013). We will 
describe all the clinical studies carried out in humans before the drug was 
approved. 





A blood clot at the centre of 
a blood vessel 


6.1 Phase 1 of testing dabigatran 


In phase 1, the drug is usually given in increasing doses to healthy 
volunteers so as to evaluate biological action and safety. 


After pre-clinical testing, which included testing on rats and rabbits, two 
phase 1 studies with healthy male volunteers were carried out. (See 
Stangier et al., 2007.) 


In the first study, 40 subjects were randomised to one of five groups and, 
within each group, subjects were randomised so that two received placebo 
and six received a single dose of dabigatran. The first group received one 
10 mg dose; in the other four groups, the single dose was, respectively, 
30mg, 100mg, 200 mg and 400mg. Blood samples were taken from 
patients to observe the rate at which the drug was eliminated from the 
body. 


In the second study, again with 40 volunteers, doses were given three times 
daily for seven days. The first group received doses of 50mg and 
subsequent groups were given higher doses up to 400 mg. 


At a dose of 400 mg three times a day, some volunteers bruised where they 
had been punctured by needles, and some had bleeding gums. The main 
safety concern was the occurrence of a major bleed. Major bleeds are 
classified according to a strict definition; any bleeding not classified as a 
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An X-ray image of a patient 
after a hip replacement 
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major bleed is considered to be a minor bleed. No major bleeds were 
observed during the phase 1 testing. 





‘That’s great, but it was supposed to be a Laxative.’ 


6.2 Phase 2 of testing dabigatran 


The studies in phase 2 are usually the first studies in which the drug is 
given to patients with the condition that the drug is designed to help. 


The aims of the first study in patients (called BISTRO I) were to check the 
drug’s effectiveness, form hypotheses about the optimal dose and assess the 
drug’s safety, in particular with respect to bleeding. 


The study was conducted over nine months across 11 sites in Sweden and 
seven in Norway. (See Eriksson et al., 2004.) Dabigatran was given for six 
to ten days to patients after a hip replacement operation, with patients 
divided into groups according to when they entered the study. The first 
group to enter the study received two doses of 12.5 mg each day and the 
dose was steadily increased for subsequent groups. The ninth group 
received two doses of 300 mg each day. 


Guidelines were carefully set up in advance with regards to stopping the 
study if the number of patients with bleeding problems became too high as 
dose size increased. 


e All major bleeds were recorded, and if 5% or more of patients 
experienced major bleeds the study would be stopped. 


e Any other bleeding, no matter how small, was recorded as a minor 
bleed. However, some bleeding from the surgical hip replacement site 
was expected, so only excessive bleeding there was recorded. 


The other issue was blood clots, which the drug was meant to prevent. 
Under guidelines for this, the dose level was to be increased for the next 
group of patients to the next dose up if 20% or more of patients who 
received the current dose experienced a blood clot. 


There were 314 patients enrolled on the study, with 289 receiving at least 
one dose and 262 patients completing the study, including a follow-up 
check four to six weeks after surgery. Although no patients experienced a 
major bleed, the study was stopped at a dose of 300 mg twice daily 
because two patients experienced minor bleeding from multiple sites. Drug 
safety and bleeding was evaluated in all 289 patients who received at least 
one dose, but data to assess drug effectiveness in preventing blood clots 
were only available for 225 patients. 


Some results are given in Table 6. It can be seen that few patients had a 
blood clot, but minor bleeds were common. The total number of events is 
given in the last row. 


Table 6 Number of bleeds and clots in the dabigatran phase 2 trial 
Dose regime Data on bleeds Data on clots 


Level ‘Times No. of | Minor No. of Clots 
(mg) per day patients bleeds patients 


12.5 2 27 2 24 5 
25 2 28 9 21 2 
50 2 30 18 27 4 

100 2 40 33 31 6 

150 1 41 39 33 3 

150 2 29 26 21 2 

200 2 28 22 21 4 

300 1 46 41 33 2 

300 2 20 16 14 0 

Total 289 206 225 28 


Example 14 Does dosage affect the number of minor bleeds? 


As there were no major bleeds, obviously there is no evidence that the 
dose regime affects the chance of a patient having a major bleed, but there 
is the question of whether daily dose level affects the number of minor 
bleeds. The numbers are quite small, so to examine this we will combine 
results into three categories: 
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Note that these are total daily 
doses. 
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e total daily dose less than or equal to 100mg 
e total daily dose 150 mg or 200 mg 
e total daily dose greater than or equal to 300 mg. 


Thus, for the first category there are 27 + 28 + 30 = 85 patients, of whom 
2+9-+18 = 29 had minor bleeds, so that 85 — 29 = 56 patients had no 
bleed. Results for the three groups give the following contingency table: 


Table 7 Number of patients with minor bleeds and number without 
minor bleed, by daily dose level 


Total daily dose Total 
< 100mg 150 or 200mg > 300mg 
With minor bleed 29 72 105 206 


Without bleed 56 9 18 83 


Total 85 81 123 289 


The null and alternative hypotheses are: 
Ho: Total daily dose of drug and the number of patients having 
a minor bleed are independent. 
Ay: There is a relationship between the total daily dose of drug 
and the number of patients having a minor bleed. 


Recall from Subsection 4.2 in Unit 8, the Expected values are calculated 
from 


Row total x Column total 
BE 
Overall total 


which gives the following Expected table. 
Total daily dose 


With minor bleed 60.5882 57.7370 87.6747 
Without bleed 24.4118 23.2630 35.3253 


Note that the Expected values are greater than 5, so it is appropriate to 
use the x? test. 


The Residuals are obtained from 
Residual = Observed — Expected. 


Thus, for example, the Residual for the first cell is 
29 — 60.5882 = —31.5882. The following is the Residual table. 


Total daily dose 


With minor bleed —31.5882 14.2630 17.3253 
Without bleed 31.5882 —14.2630 —17.3253 


For the first cell, the contribution to x? is given by, 
(Residual)? 
Expected 
(—31.5882)? 
= ~—___ © 16.4688. 
60.5882 


Repeating the calculation for all six cells results in this y? table: 


x? contribution = 


Total daily dose 


With minor bleed 16.4688 3.5234 3.4236 
Without bleed 40.8743 8.7449 8.4972 


Hence the value of the x? test statistic is 
16.4688 + 3.5234 + 3.4236 + 40.8743 + 8.7449 + 8.4972 ~ 81.532 


The number of degrees of freedom for a 2 x 3 contingency table is 
(2-1) x (3-1) =2. 


Hence, from Table 3 (Exercises on Section 4), the critical value at the 5% 
significance level is 5.991, and at the 1% significance level it is 9.210. 


Since the test statistic, 81.532, is much greater than the 1% critical value, 
9.210, we reject Hp in favour of Hı at the 1% significance level and 
conclude that there is strong evidence of a relationship between the total 
daily drug dose and the number of patients having a minor bleed. Looking 
at the data in Table 7 (or at the Residual table), bleeds seem less likely 
when the daily dose is 100 mg or less compared with when it is 150 mg or 
more. 





Activity 23 Does dosage effect the number of blood clots? 


IE 


One purpose of phase 2 is to start to learn whether the drug is effective. == 
The purpose of dabigatran is to reduce the risk of blood clots. Table 6 

gives data on the number of blood clots by dose regime. Combine the dose 
categories into three groups, using the same dose regime groups as in 

Example 6. Form a contingency table appropriate for testing whether the 

total daily dose of drug and the number of patients having a blood clot are 
independent. Perform the test and report your conclusion. 


Phase 2 testing continued with a larger study, called BISTRO II, that 
shared the same aims as the BISTRO I study (Eriksson et al., 2005). 


It was a double-blind randomised controlled trial of 1973 patients who had 
just undergone a hip or knee replacement across 60 centres in Europe and 
two centres in South Africa. Given the inconclusive results of the 

BISTRO I study and the need for sensible dosing schedules to be 
determined for phase 3 testing, patients in the treatment group were 
randomised to one of four dosing schedules: 50 mg twice a day, 150 mg 
twice a day, 300 mg once a day and 225 mg twice a day. Patients 
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randomised to the control group were given an existing treatment proven 
to reduce the risk of blood clots, enoxaparin. 


Activity 24 Use a placebo as the control treatment? 


Why would it be unethical to compare dabigatran with placebo? 


Activity 25 Double-blind trial — How? 


Dabigatran is given orally in capsule form, while enoxaparin is given as an 
injection. How do you suppose it was possible to achieve blinding? 





Figure 10 (a) A drug in capsule form; (b) an injection of a drug 


Activity 26 Questions of interest? 


For each question that you identify in (a) and (b), state the null 
hypothesis that should be tested. 


(a) What are the main questions involving only dabigatran that should be 
examined after the data have been gathered? 


(b) What are the main questions to examine in comparing dabigatran 
with enoxaparin? 


A summary of some of the main results is given in Table 8. It shows the 
number of patients experiencing major or minor bleeds and the number 
experiencing blood clots. Data on bleeds relate to all patients who received 
at least one dose of a drug in the trial — just under 400 patients for each 
drug/dosage regime. Data on the number of clots determine the efficacy of 
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a treatment and are only given for those patients who completed the trial 
— about 300 patients for each drug/dosage regime. 


Table 8 Number of bleeds and clots with dabigatran and enoxaparin in 


BISTRO II 

Drug, dose Data on bleeds Data on clots 

and daily 

frequency No. of Major Minor No. of Clots 
patients bleeds bleeds patients 

Dabigatran 389 1 18 302 86 

50mg: twice 

Dabigatran 390 16 31 282 49 

150mg: twice 

Dabigatran 385 18 37 283 47 

300 mg: once 

Dabigatran 393 15 38 297 39 

225mg: twice 

Enoxaparin 392 8 25 300 72 

i 

Total 1949 58 149 1464 293 


Statistical analysis showed that, compared with enoxaparin, the rate of 
occurrence of clots was significantly lower with dabigatran when it was 
administered at 150 mg twice daily (p = 0.04), 300 mg once daily (p = 0.02) 
or 225 mg twice daily (p = 0.0007). Looking at Table 8, there is a 
suggestion that taking dabigatran at these higher doses increases the risk 
of a major bleed, compared with enoxaparin, but in fact the differences are 
not statistically significant at the 5% significance level. 


When administered in twice-daily doses of 50mg, dabigatran had 
significantly lower rates of major bleeds than at other levels, and a 
significantly lower rate than enoxaparin. However, at that dosage 
dabigitran seemed no better (and possibly worse) than enoxaparin at 
reducing the rate of clots. 


The conclusion of the BISTRO II study team was that, with dabigatran, 
both the effectiveness (reduction in the rate of clots) and safety (rate of 
bleeding events) depended on the dose level. They also concluded that the 
three highest doses of dabigatran were significantly more effective than 
enoxaparin at reducing the rate of blood clots, although at these levels 
there appeared to be an increase in bleeds. 


6.3 Phase 3 of testing dabigatran 


In phase 3, treatment with the new drug is compared with existing 
therapies in a wider range of contexts. 
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There were three large phase 3 studies initiated in 2004, with results 
published in 2007. Their study set-up and aims were similar to the 
BISTRO II trial: patients who were undergoing hip or knee replacement 
were randomised to the treatment group, receiving dabigatran, or the 
control group, receiving enoxaparin, and effectiveness and safety outcomes, 
particularly bleeding, were compared. Given the study team’s conclusions 
on dosing after the BISTRO II trial, dabigatran was administered once per 
day at a dose of either 150 mg or 220 mg. 


One of the three studies, called RE-MOBILIZE, was carried out in North 
America, where the recommended dose of enoxaparin after hip or knee 
replacement is 60 mg/day, different from the 40 mg dose used throughout 
Europe (RE-MOBILIZE Writing Committee, 2009). The results of the 
RE-MOBILIZE trial were of less relevance to the CHMP in deciding 
whether dabigatran should be licensed within Europe, so will not be 
described further here. 


In one of the other two trials, called RE-MODEL, a total of 2101 knee 
replacement patients located at one of 105 centres in Europe, Australia or 
South Africa were randomised to one of the two treatment groups (150 mg 
or 220 mg dabigatran) or the control group (40 mg enoxaparin) (Eriksson 
et al., 2007a). 


In the third trial, called RE-NOVATE, a total of 3494 hip replacement 
patients located at one of 115 centres in Europe, Australia or South Africa 
were randomised (Eriksson et al., 2007b). 


Results of the latter two trials are given in the tables below. For instance, 
in the 220 mg dose of dabigatran group in the RE-MODEL trial (Table 9), 
the numbers 10/679 mean that, of the 679 patients for whom data are 
available, 10 of these had a major bleed. There were fewer data for blood 
clots than for the safety outcomes, because for some patients the relevant 
data were not collected or were inadequate. 


Table 9 Number of bleeds and clots or death with dabigatran and 
enoxaparin in RE-MODEL (knee replacement) 


Drug and dose 


dabigatran enoxaparin 
220mg 150mg 40 mg 
Major bleeds 10/679 9/703 9/694 
Minor bleeds 60/679 59/703 69/694 


Blood clots or death 183/503 213/526 193/512 


Table 10 Number of bleeds and clots or death with dabigatran and 
enoxaparin in RE-NOVATE (hip replacement) 


Drug and dose 


dabigatran enoxaparin 
220mg 150mg 40 mg 
Major bleeds 23/1146 15/1163 18/1154 
Minor bleeds 70/1146 72/1163 74/1154 
Blood clots or death 53/880 75/874 60/897 


In Tables 9 and 10, the proportions of people with blood clots or who died 
are slightly lower with the 220 mg dose of dabigatran than with enoxaparin. 
For example, in the 220 mg dose of dabigatran group of the RE-MODEL 
trial, 183 patients out of 503 patients had a blood clot or died (though 
only 1 patient died in this treatment group), corresponding to 36.4% of 
patients. In the enoxaparin group, 193 out of 512 patients had a blood clot 
or died (there was also only 1 death in this group), corresponding to 37.7% 
of patients. Also, the proportions of people with bleeds are slightly lower 
with the 150 mg dose of dabigatran than with enoxaparin. However, the 
study investigators found no statistically significant difference between the 
effectiveness or safety of either dose of dabigatran and enoxaparin. 


In addition to the three phase 3 studies mentioned above, a further phase 3 
study, called RE-LY, looked at the effectiveness of dabigatran in 
preventing stroke in patients with atrial fibrillation (a heart condition 
which, as mentioned right at the start of this case study, is one where it 
was hoped dabigatran would be useful). (Connolly et al., 2009, and see 
also correction in Connolly et al., 2010.) This study, initiated in 2005, was 
a very large worldwide study in which 18118 atrial fibrillation patients 
were randomised to one of two dabigatran dosing schedules or to a control 
group. The dabigatran dosing schedules were 110mg or 150 mg, twice 
daily, while the control group were given the most commonly used existing 
anticoagulant treatment for this condition, warfarin. The study design was 
like a cohort study: patients were followed over time, for a median of two 
years, with several follow-up visits initially, then every four months until 
the end of the study. Treatments were not blinded. Data were collected on 
all medical conditions, however minor. 


The main aim of the study was to evaluate the effectiveness of dabigatran 
at reducing the risk of stroke, and to look at the safety of long-term use of 
the drug with a particular emphasis on bleeding. The trial took several 
years to run, and results were published in 2009. We will not show a table 
of the results here as they are more difficult to interpret: the length of 
time in the study differed between patients, so person-time had to be taken 
into account. The main findings were: 


e At a dose of 110mg twice daily, dabigatran was as effective as warfarin 
in reducing the risk of stroke, with a lower risk of major bleeds. 


e Ata dose of 150 mg twice daily, dabigatran was more effective than 
warfarin in reducing the risk of stroke, with a similar overall risk of 
major bleeds. The risk of any type of bleeding was lower, but the risk 
of gastro-intestinal (stomach and intestine) bleeding was higher. 
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6.4 Marketing authorisation and use 


During 2007, the CHMP considered the evidence from the RE-MODEL 
and RE-NOVATE trials in deciding whether or not to recommend 
dabigatran for licensing in Europe. The CHMP concluded that dabigatran 
was as effective as enoxaparin in preventing blood clots, and safety profiles 
were similar. It was noted that dabigatran was more convenient for 
patients than enoxaparin because it is taken orally, rather than given as an 
injection. 

The CHMP considered that the benefits of dabigatran outweighed its risks 
and recommended to the European Commission that it be given marketing 
authorisation for use after hip and knee replacement surgery. This was 
granted in March 2008. After completion of the RE-LY study, marketing 
authorisation for stroke prevention in patients with atrial fibrillation 
followed in August 2011. 


fa (Sentence Dabigatran is convenient to use as it can be taken orally in a capsule that 
“s Pradaxa acts immediately and requires no monitoring. Partly for this reason, it is 
C (abigatran etn = considered to be an important new anticoagulant. Two alternative 

= 150mg" anticoagulants have been mentioned, enoxaparin and warfarin. Enoxaparin 
BIRTA Swallow capsule whole is available only as an injection, making it a more invasive treatment. 

BESS iuni Dorae ses pctas, Warfarin is taken orally, but its use requires an initial phase of 

EEE capsules normalisation which requires frequent blood tests. Standard practice is to 


give patients an anticoagulant for several weeks after hip or knee 
replacement, but their hospital stay might be less than five days. It follows 
that dabigatran is more convenient for both patients and medical staff. 


The cost-effectiveness of dabigatran for use after hip or knee replacement 
was reviewed by NICE during 2008. While dabigatran is a more expensive 
drug than other existing treatments, its convenience means it takes up less 
clinical time. NICE approved dabigatran as an available option for 
preventing blood clots after hip or knee replacement surgery on the NHS 
(NICE, 2008). It also recommended further clinical trials to compare 
dabigatran with other existing anticoagulants. 





6.5 Surveillance 


Phase 4 studies use larger samples of patients than can be obtained 
before marketing. They aim to obtain further evidence about the 
safety of the drug. 


At the end of 2011, the EMA published a press release (EMA, 2011) on the 
safety of dabigatran. There were worldwide reports of bleeding on the 
drug, which in a few cases led to death. Excessive bleeding is a well-known 
risk with any anticoagulant, and evidence from the trials suggested 
dabigatran had a similar safety profile to other anticoagulants. It was 
noted that with increasing worldwide use and awareness, the total number 
of adverse events reported tends to increase. However, the EMA 
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recommended that further checks of a patient’s health should be carried 
out before the patient is prescribed dabigatran. The EMA will continue to 
monitor the safety of the drug. 


Exercises on Section 6 





Exercise 8 Analysing the RE-NOVATE trial +a 


Table 10, which gave results of the RE-NOVATE trial, is reproduced 
below. Carry out a x? test to test the null hypothesis of no difference 
between effectiveness in reducing the risk of blood clots (or death) when 
treated with 220mg dabigatran and 40 mg enoxaparin. Start by forming a 
suitable 2 x 2 contingency table. You may carry out your calculations 
either by hand or using Minitab. 


Drug and dose 


dabigatran enoxaparin 
220mg 150mg 40 mg 
Major bleeds 23/1146 15/1163 18/1154 
Minor bleeds 70/1146 72/1163 74/1154 
Blood clots or death 53/880 75/874 60/897 





¢ Computer Book: clinical trials 


In Section 3 you learnt about crossover trials, matched-pairs trials and same 
group-comparative trials. Chapter 11 of the Computer Book tells you how 

to use Minitab to randomise patients to treatments for these clinical trials. 

It then gives you more practice at using Minitab to carry out y? tests and 

t-tests. 


You should work through all of Chapter 11 of the Computer Book now, if 
you have not already done so. 
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Summary 


In this unit we looked at the testing of new drugs. The tests have two 
main aims: to determine whether a drug is an effective treatment and to 
make sure it is safe, with no serious adverse side effects. New drugs are 
tested in clinical trials that use a control group. 


Section 2 explained the difficulties of setting up effective controls, and how 
some of these difficulties can be overcome by using placebos. It 
distinguished subjective measurements, such as sensations, from objective 
measurements, to which relatively precise numerical values can be 
attached. Bias can arise in experiments from the expectations of both the 
experimenters and the subjects, but it can be removed by blind and 
double-blind experiments. Problems also arise from the variability of the 
experimenters and of the people who are the subjects. 


Different types of clinical trial were described in Section 3. One type is the 
crossover trial, where a patient receives both the control (placebo) 
treatment in one time period and the new drug in a different time period. 
A second type is a matched pairs trial, where each person given the new 
drug is matched with a second person who is given the control treatment. 
Both these forms of trial aim to remove some of the variability between 
people, either by using each person as his or her own control, or by 
matching people with similar attributes. A third type of clinical trial is the 
group-comparative trial. Here people are allocated to the control group 
and treatment group at random, subject to the restriction that each group 
should be representative of the population being studied. 


Section 4 showed that some medical conditions and ailments require 
categorical measurements, others ordinal measurements and others are 
measured on an interval scale. Different statistical tests need to be used for 
data from these different types of measurement. This unit illustrated the 
use of t-tests and y? tests in clinical trial analysis. 


The problems of detecting the side effects of a drug were discussed in 
Section 5. Some side effects are expected and these can be measured by 
experimental procedures similar to those used to measure the desired 
effects of a drug. Other side effects may not be expected, as in the case of 
practolol, a drug developed for treating heart conditions which turned out 
to damage the eye. Clinical trials are necessarily limited in scale and 
duration, so they may not reveal all the side effects of a drug. 
Post-marketing surveillance is therefore important, and this section 
described various methods of post-marketing surveillance, together with 
problems that are involved in each. 


Section 6 worked through all the major stages in the testing of a new drug. 
Perhaps the overriding messages from this case study are that study 
designs can be complex and results may not be clear-cut. Also, the process 
of drug licensing is slow — first reports of dabigatran were published in 
2001, yet it took until 2008 to license the drug. 


Learning outcomes 


Learning outcomes 


After working through this unit, you should be able to: 


e understand the need for controls in a clinical trial and how to control 
for certain factors 


e explain what is meant by a placebo and explain how and why placebos 
are used in clinical trials 


e explain some of the ethical questions that have to be answered when 
assessing whether a group of patients can be given a placebo 


e distinguish between subjective and objective measurements in clinical 
trials 

e identify sources of bias in clinical trials and suggest experimental 
procedures whereby bias can be reduced 


e identify sources of variability in clinical trials and recognise when this 
variability can hamper the interpretation of the results from a clinical 
trial 


e describe crossover, matched-pairs and group-comparative designs of 
clinical trials 


e recognise those investigations in which each of these designs should be 
used and those in which they should not 


e understand some uses of random methods of allocation in the design 
of an experiment, and their importance 


e understand that the design and analysis of a clinical trial (or any 
scientific experiment) are closely connected 


e distinguish between categorical, ordinal and interval scale 
measurements, and understand their uses in clinical trials 


e analyse categorical data from trials using the y? test 
e analyse interval scale data from trials using the t-test 


e explain why clinical trials may fail to detect all the side effects of a 
new drug 


e describe some methods of post-marketing surveillance, together with 
their advantages and drawbacks. 
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Solutions to activities 


Solution to Activity 1 


The answer is nothing at all. Headaches tend to go away sooner or later 
anyway. Perhaps your headache would have been gone in an hour even if 
you had not taken the drug. 


Solution to Activity 2 


You cannot say much more than in the previous activity. You are more 
likely to believe that aspirin really works if 16 out of 20 headaches improve 
rather than if none improve, but you still do not know how many of the 
headaches would have got better anyway. 


Solution to Activity 3 


Each member of the experimental group has taken a pill, whereas each 
member of the control group has not. This may appear to be a trivial 
difference, but it is not. Every doctor knows that the mere fact of taking a 
pill can have beneficial effects, irrespective of any active ingredients that 
the pill may contain. 


Solution to Activity 4 


The control group can be given dummy tablets which look like aspirin and 
taste like aspirin but contain no active drug. 


Solution to Activity 5 


Scurvy is a serious illness that can result in death. Hence it would not 
have been ethical to give a placebo — all patients needed to be treated. 


Solution to Activity 6 
Some of the problems are as follows. 


e With either treatment, the headache may take more than an hour to 
go away, but last longer when the placebos are taken instead of the 
aspirin. 

e With either treatment, the headache might be gone after an hour. 
However, the headache may have gone away in 5 minutes with the 
aspirin and in 45 minutes with the placebos, for example. 


e The headache may be still there but may not be as severe with the 
aspirin as with the placebos. 


Solution to Activity 7 


It probably does not. If a drug gives only a degree of relief which is not 
easily noticeable, then it will not be very useful. Thus it is only quite 
substantial differences that are likely to be of interest. 


Solution to Activity 8 


It certainly does matter, because the people will expect the active tablet 
to work and the dummy tablet to be ineffective. Thus their expectations 
of the treatment will be quite different in the two cases. So it is important 
to make sure, as far as possible, that the subjects do not know which 
treatment they are receiving. 


Solution to Activity 9 


It still matters, because the doctors also have expectations. They will 
probably expect the active treatment to be better than a placebo, and 
their expectations can often affect how the patients respond to the 
treatment. 


Solution to Activity 10 


If all treatments were equal, you would expect the two patients given 
sea-water to take longer to recover from the scurvy, so the results from 
this group may appear worse than the other treatment groups. 


Solution to Activity 11 


An independent person could have assigned and given the treatments in a 
private space. Then James Lind would have been blinded to the 
treatments. Blinding patients to what treatment they themselves received 
would not have been possible, but patients could have been blinded to the 
treatments that other patients received. 


Solution to Activity 12 


It would be unsatisfactory because people are variable. Not only do 
different people vary in the extent to which aspirin reduces their 
headaches, but an individual’s response to aspirin can vary from one time 
to another. For example, perhaps with some people the aspirin produces 
no relief if the headache is very severe, but banishes it altogether if the 
headache is slight. 


Solution to Activity 13 


Measurements should be taken at specific times before and after eating. In 
studies of drugs designed to control glucose levels in diabetics, it is normal 
to take blood glucose or similar measurements after fasting as a baseline. 


Solution to Activity 14 


The measurements for men and women are very similar, so there is little 
need to ensure that the control and experimental groups are of the same 
sex. However, the maximum heart rate decreases quite noticeably with 
age, so it would be important to ensure that the experimental and control 
groups were either of similar age or contained similar proportions of old 
and young people. 
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Solution to Activity 15 


It assumes that the effect of the drug is reversible, i.e. that the patient is 
essentially the same after the treatment has been withdrawn as before it 
was started. In practice, this assumption would need to be tested and it 
will not hold if, for instance, the patient is cured by the treatment. 


Solution to Activity 16 


(a) A crossover trial can be used here. Diabetes is a chronic condition 
that patients will have for the duration of the trial. 


(b) It would not be appropriate to use a crossover design here as scurvy 
could be cured (or have resulted in death) prior to receiving the 
second diet supplement. 


Solution to Activity 17 


Matching first on sex, then smoking status, then age, gives the following 
pairs: 1 and 5, 2 and 4, 3 and 7, 6 and 8. If the relative importance of sex, 
smoking status and age changes, these matched pairs will change. 


Solution to Activity 18 


By allocating patients to the two groups at random. 


Solution to Activity 19 


The null hypothesis would be that the effect of the new drug is not 
different from the effect of the existing drug. The alternative hypothesis 
would be that the effect of the new drug is different from the effect of the 
existing drug, i.e. that it relieves headache pain either more or less 
effectively. 


Solution to Activity 20 


A two-sided test would be more appropriate. There is a chance that the 
existing drug will be better than the new drug (the existing drug has 
presumably proved better than competitive drugs in the past), but the 
drug company hopes that the new drug will be the better drug. They wish 
to detect differences between the new and the existing drug in both 
directions. Therefore a two-sided test should be used. 


Solution to Activity 21 


(a) Body temperature can be measured on an interval scale, giving 
interval scale data. 


(b) As pain can be rated (from mild to severe), these are ordinal data. 


(c) There are (two) categories, pregnant and non-pregnant, so the 
information gives categorical data. 


(d) Blood pressure measurements give interval scale data. 


Solution to Activity 22 


Yes, there is a major difference in the two selection procedures. In 
post-marketing surveillance, people are allocated to groups after the 
decision is taken as to which patients should be treated with the particular 
drug. 


Solution to Activity 23 


Combining results for the three groups gives the following contingency 
table: 


Total daily dose 
<100mg 150 or 200mg > 300mg Total 


Blood clot 11 9 8 28 


No blood clot 61 59 81 197 


Total 72 64 89 225 


The null and alternative hypotheses are: 


Ho: Total daily dose of drug and the number of patients having 
a blood clot are independent. 


Hı: There is a relationship between the total daily dose of drug and 
the number of patients having a blood clot. 


The Expected table is as follows: 


Total daily dose 


Blood clot 8.9600 7.9644 11.0756 
No blood clot 63.0400 56.0356 77.9244 


Note that the Expected values are greater than 5, so it is appropriate to 


use the y? test. 
Residual = Observed — Expected, so the following is the Residual table. 


Total daily dose 


Blood clot 2.0400 1.0356 —3.0756 
No blood clot —2.0400 —1.0356 3.0756 


For the first cell, the contribution to x? is given by: 
(Residual)? 
Expected 
(2.0400)? 
= —— ~ 0.4645. 
8.9600 


Repeating the calculation for all six cells results in this x? table: 


x? contribution = 


Total daily dose 


Blood clot 0.4645 0.1347 0.8541 
No blood clot 0.0660 0.0191 0.1214 
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Hence the value of the x? test statistic is 
0.4645 + 0.1347 + 0.8541 + 0.0660 + 0.0191 + 0.1214 ~ 1.660 


As in Example 14, the number of degrees of freedom is 

(2 — 1) x (3 — 1) = 2 so, from Table 3 (Exercises on Section 4), the critical 
value at the 5% significance level is 5.991 and at the 1% significance level 
it is 9.210. 


Since the test statistic, 1.660, is less than the 5% critical value, 5.991, we 
cannot reject Ho at the 5% significance level. Hence we conclude that the 
data provide little evidence of a relationship between the total daily drug 
dose and the number of patients getting a clot. 


Solution to Activity 24 


As noted at the start of this section, patients who have undergone a hip or 
knee replacement are at high risk of blood clots. Blood clots can be 
dangerous so it would be unethical to give such patients a placebo when 
there are known, established treatments for reducing the risk. 


Solution to Activity 25 


Patients in the treatment group were given a placebo injection in addition 
to the dabigatran capsule, and patients in the control group were given a 
placebo capsule in addition to the enoxaparin injection. 


Solution to Activity 26 


(a) The same questions that were examined in BISTRO I should be 
examined in BISTRO II. Thus the questions to examine are whether 
the daily dose of drug affects the number of patients having (i) a 
minor bleed, (ii) a major bleed and (iii) a blood clot. 


BISTRO I found evidence that the number of patients having a minor 
bleed was affected, so we would expect BISTRO II to replicate that 
result. BISTRO II is a larger trial than BISTRO I so there are more 
likely to be major bleeds. (There were none in BISTRO I.) Also, 
because it is a larger trial it may find evidence that drug dosage 
affects the number of patients suffering a blood clot. 


The null hypotheses would be: 


(i) Ho: Total daily dose of dabigatran and the number of patients 
having a minor bleed are independent. 


(ii) Ho: Total daily dose of dabigatran and the number of patients 
having a major bleed are independent. 


(iii) Ho: Total daily dose of dabigatran and the number of patients 
having a blood clot are independent. 


(b) The primary aim in comparing the two drugs is to discover whether 
dabigatran is better than enoxaparin. That is, is there a daily dose 
level of dabigatran for which it is better than enoxaparin? Thus, for 
the first dose regime, the question is whether taking 50 mg doses of 
dabigatran twice a day is better than taking enoxaparin. Here, 
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‘better’ relates to the number of patients having (i) a minor bleed, 
(ii) a major bleed and (iii) a blood clot. 


The corresponding hypotheses are: 


(i) Ho: The drug taken (50 mg of dabigatran twice a day or 
enoxaparin) and the number of patients having a minor bleed are 
independent. 


(ii) Ho: The drug taken (50 mg of dabigatran twice a day or 
enoxaparin) and the number of patients having a major bleed are 
independent. 


(iii) Ho: The drug taken (50 mg of dabigatran twice a day or 
enoxaparin) and the number of patients having a blood clot are 
independent. 


The question would be repeated for each of the other dose levels of 
dabigatran that were examined in the trial (150 mg twice a day, 
300 mg once a day and 225 mg twice a day). 
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Solutions to exercises 


Solution to Exercise 1 


(a) 
(b) 


(a) 


Weight loss can be measured on an objective scale — in kilograms per 
week, for example. 


Certain aspects of appetite can be measured only on a subjective 
scale. Researchers might wish to know, for example, whether a drug 
affected people’s subjective impression of their own appetite. On the 
other hand, it is possible to set up objective measures which might 
accurately reflect these subjective impressions: for example, the 
amount of food eaten (measured by weight or calorie content) over a 
certain period of time. 


This would probably be measured on a subjective scale. People might 
be asked to rate the soreness of their throat on a scale of discomfort, 
or doctors might assess the soreness of the throat on the intensity of 
the inflammation, or the size of the inflamed area. The inflammation 
could in theory be measured objectively, but in practice it would 
probably not be measured in this way. 


This would be measured on a subjective scale. People would probably 
be asked to rate the severity of their discomfort on a numerical scale. 


Solution to Exercise 2 


(a) 


The severity and nature of the depression — and the individual’s 
history of depression — may vary greatly. For example, some 
individuals might be suffering from long-lasting and extremely severe 
depression, whilst others might be suffering from short-lived and 
moderate depression. The causes, and therefore the cure, of the 
depression might thus differ. A woman suffering from post-natal 
depression might, for example, respond to a drug very differently from 
a woman suffering from a bereavement. A person with a long history 
of depressive illness might respond to a drug very differently from a 
person who is experiencing depression for the first time. 


People who volunteer to take part in a clinical trial might not be 
typical of people in general suffering from depression. For example, 
severely depressed people might not be willing to volunteer at all. 
Hence the results of these trials on volunteers might not be 
representative of the results that would be obtained in the population 
of depression-sufferers in general. 


The problem outlined in part (a) would have to be overcome by 
making sure that the experimental and control groups were as similar 
as possible with respect to the severity, nature and history of their 
depression. The problem outlined in part (b) would have to be 
overcome by selecting subjects for the clinical trial who were 
representative of the population of depression-sufferers in general, if 
this is ethically possible. 
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Solution to Exercise 3 


Despite the use of standardised questionnaires, the patients’ responses 
may be markedly affected by their attitude towards the experimenters, 
and this attitude is likely to vary from one patient, and one experimenter, 
to another. For example, the answers that an aggressive experimenter 
obtained might be different from those that would have been obtained 
from the same patient by a more polite experimenter. 


Solution to Exercise 4 


(a) This is an ideal example of a matched-pairs design, but one that is 
extremely difficult to set up, owing to the difficulty of finding enough 
pairs of identical twins suffering from the condition of interest. 

(b) This is a group-comparative design. 


(c) This is a crossover design; each individual crosses over during the 
course of the experiment from one treatment to the other. 


Solution to Exercise 5 


(a) A crossover design would be most appropriate here, provided that 
there was no carry-over effect. 


(b) The only possible option here would be a group-comparative design 
because it would not be ethical to use the extra time required to set 
up a matched-pairs trial. 


(c) A matched-pairs design could be the most appropriate here because it 
might be possible to obtain a lot of volunteers for the trial and to 
select pairs that match for age, sex, how heavily they smoke, etc. If 
enough matched pairs could not be found, then a group-comparative 
design is best, probably using stratified randomisation so that the 
control group and treatment group have similar characteristics for 
factors thought important. 


Solution to Exercise 6 
(a) The data are categorical. 


(b) The data are tabulated in a suitable contingency table below. 


Experimental group Control group Total 


Conceived 1 15 16 


Not conceived 999 985 1984 
Total 1000 1000 2000 


(c) Null hypothesis (Ho): The effects of the new and old contraceptive on 
conception are the same. 


Alternative hypothesis (Hı): The effects of the new and old 
contraceptive on conception are different. 


Expressing these in a form suitable for the y? test involves looking at 
the experiment in a slightly different way. The drug company was 
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interested in determining whether the type of contraceptive used has 
any effect on the number of women who conceived during the one-year 
period. We should therefore set up the null and alternative hypotheses 
in terms of these variables as follows. 


Ho: The type of contraceptive used and the number of women 
who conceived during the one-year period are independent. 


Hı: There is a relationship between the type of contraceptive 
used and the number of women who conceived during 
the one-year period. 


(You may have set up the null and alternative hypotheses using 
slightly different wording. This does not matter as long as you have 
labelled the variables clearly and the hypotheses convey the same 
meaning.) 


(d) The Expected values are calculated using the method of Unit 8 
(Subsection 4.2), 


__ row total x column total 
7 overall total 
which gives the following table of Expected values. 


Experimental group Control group 


Conceived 8 8 
Not conceived 992 992 


Notice that all the Expected values are greater than 5, so it is 
appropriate to use the x? test. 


The Residuals are obtained from 
Residual = Observed — Expected. 


Thus, for example, the Residual for the first cell is 1 — 8 = —7. The 
following is the Residual table. 


Experimental group Control group 


Conceived —7 7 
Not conceived 7 —7 


For the first cell, the contribution to x? is given by, 
(Observed — Expected)? 
Expected 

(Residual)? 

Expected 

Co 

8 

= 6.125. 


x? contribution = 





Repeating the calculation for all four cells results in this x? table: 


156 


Experimental group Control group 


Conceived 6.125 6.125 
Not conceived 0.0494 0.0494 


where the x? contributions for the ‘Not conceived’ groups have been 
rounded to four decimal places. 


Hence the value of the y? test statistic is 
6.125 + 6.125 + 0.0494 + 0.0494 ~ 12.349. 


The number of degrees of freedom for a 2 x 2 contingency table is 
(2—1)x (2-1)=1. 


Hence, from Table 3, the critical value at the 5% significance level is 
3.841, and at the 1% significance level it is 6.635. 


Since the test statistic, 12.348, is greater than the 1% critical value, 
6.635, we reject Ho in favour of H; at the 1% significance level and 
conclude that there is strong evidence of a relationship between the 
type of contraceptive used and the number of women who conceived in 
the one-year period. In the terminology of Subsection 4.2, the new and 
old contraceptives have significantly different effects on conception. 


Looking at the contingency table in (b), you can see that the new 
contraceptive results in fewer conceptions. This, together with the 
result of the hypothesis test, means that the drug company should be 
satisfied that the new contraceptive was better than the existing one 
against which they tested it. These results would also be useful 
evidence to submit with an application for a product licence, provided 
that they were accompanied by evidence that there were no side 
effects and that the trials had been carried out carefully — for example, 
that both groups of women were using the contraceptives correctly. 
(Many oral contraceptives are effective only if they are taken regularly 
at the same time every day.) 


Solution to Exercise 7 


(a) 


(c) 


(a) 


These data are on an interval scale (of beats per minute). Not only is 
it possible to say that a heart rate of 120 beats per minute is larger 
than one of 70 beats per minute, it also makes sense to say that it is 
50 beats per minute larger. 


Since this was a crossover design, the two measurements for each 
person form a matched pair. (Each pair of measurements consists of 
the heart rates for one person — when given the drug and when given 
the placebo.) The sample size is small, so the matched-pairs t-test is 
appropriate here. 


The necessary assumption is that the population distribution of the 
differences in heart rate (between the placebo and the drug) follow a 
normal distribution. 


The hypotheses for a matched-pairs t-test are: 
Ho: u4 =0 and Ay: p4 #0, 
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nN 


where d is the difference between the placebo and drug. 


The two-sided matched-pairs t-test statistic is found using the 
procedure in Subsection 4.2 of Unit 10. The first step is to calculate 
the differences between the pairs of data values, square them, and 
total these, as in the table below. Here we have subtracted the D 
(drug) values from the P (placebo) values. You may have subtracted 
the P values from the D values, in which case your solution will differ 
slightly from ours. However, you should end up with the same 
conclusion. 


Heart rate (beats per minute) Difference 


Subject Placebo Drug d d 
1 105 90 15 225 
2 88 82 6 36 
3 90 95 —5 25 
4 89 80 9 81 
5 80 88 —8 64 
6 110 75 35 1225 
7 84 83 1 1 
8 100 90 10 100 
> 746 683 63 1757 
The mean of the differences is 
ger ee 
n 8 
Also 


v= (pe-22) = (1797-5) 


1 
= (1757 — 496.125) = 180.125. 


Thus, the standard deviation is s = v 180.125 ~ 13.421. Hence, 
13.421 
feb 
vn V8 


Now, the test statistic is 


~ 4.745. 





as the null hypothesis is Ho: ua = 0. Thus 
_ 1.875 
~ 4,745 


From Table 4, the critical value at the 5% significance level with 
n — 1 = 7 degrees of freedom is 2.365. 


~ 1.660. 


Since the test statistic, 1.660, is less than the critical value, 2.365, we 
cannot reject Ho at the 5% significance level. In the terminology of 
Subsection 4.2, the drug and placebo are not significantly different in 
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their effect on heart rate. Thus, from these data there is little 
evidence that the new drug alters people’s heart rate. 


(However, the sample of people used is very small, so it might still be 
worth conducting a larger trial with more people.) 
Solution to Exercise 8 
The contingency table for the data is as follows: 
Blood clot Total 
Yes No 


Dabigatran 220mg 53 827 880 
Enoxaparin 40 mg 60 837 897 


Total 113 1664 1777 
The null and alternative hypotheses are: 


Ho: Daily doses of 220mg dabigatran and 40 mg enoxaparin are 
equally effective at reducing the risk of a blood clot. 


Hı: Daily doses of 220mg dabigatran and 40 mg enoxaparin are 
not equally effective at reducing the risk of a blood clot. 


The Expected table is as follows. 


Yes No 


Dabigatran 220mg 55.9595 824.0405 
Enoxaparin 40mg 57.0405 839.9595 


The Expected values are greater than 5, so it is appropriate to use the y? 
test. 


Residual = Observed — Expected, so the following is the Residual table. 


Yes No 


Dabigatran 220mg —2.9595 2.9595 
Enoxaparin 40 mg 2.9595 —2.9595 


The x? table is: 


Yes No 


Dabigatran 220mg 0.1565 0.0106 
Enoxaparin 40mg 0.1536 0.0104 


The x? statistic is therefore 
x? = 0.1565 + 0.0106 + 0.1536 + 0.0104 ~ 0.331. 


The number of degrees of freedom for a 2 x 2 contingency table is 

(2—1) x (2— 1) = 1. So, from Table 3 (Exercises on Section 4), the 
critical value at the 5% significance level is 3.841. As 0.331 < 3.841 we do 
not reject the null hypothesis at the 5% significance level. There is little 
evidence that daily doses of 220mg dabigatran or 40 mg enoxaparin differ 
in their effectiveness at reducing the risk of a blood clot. 
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